Skip to content

Preset method

UPerNet

Introduction

UPerNet is a classic approach employed in the SwinTransformer paper, which is implemented based on Feature Pyramid Network (FPN) and Pyramid Pooling Module (PPM). It combines the feature maps of different scales derived from the backbone network to enhance the model performance.

Citation

@inproceedings{xiao2018unified,
  title={Unified perceptual parsing for scene understanding},
  author={Xiao, Tete and Liu, Yingcheng and Zhou, Bolei and Jiang, Yuning and Sun, Jian},
  booktitle={European Conference on Computer Vision},
  pages={418--434},
  year={2018}
}

SETR

Introduction

SETR is a method specifically designed for ViT (Vision Transformer). It utilizes progressive upsampling and employs alternate convolution and upsampling to restore the feature maps from the backbone network to the original image size for prediction, thus reducing noisy predictions.

Citation

@inproceedings{zheng2021rethinking,
  title={Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers},
  author={Zheng, Sixiao and Lu, Jiachen and Zhao, Hengshuang and Zhu, Xiatian and Luo, Zekun and Wang, Yabiao and Fu, Yanwei and Feng, Jianfeng and Xiang, Tao and Torr, Philip HS and others},
  booktitle={Computer Vision and Pattern Recognition},
  pages={6881--6890},
  year={2021}
}