Related papers: ContextFormer: Redefining Efficiency in Semantic S…

Segmenter: Transformer for Semantic Segmentation

Image segmentation is often ambiguous at the level of individual image patches and requires contextual information to reach label consensus. In this paper we introduce Segmenter, a transformer model for semantic segmentation. In contrast to…

Computer Vision and Pattern Recognition · Computer Science 2021-09-03 Robin Strudel , Ricardo Garcia , Ivan Laptev , Cordelia Schmid

Context-Aware Semantic Segmentation: Enhancing Pixel-Level Understanding with Large Language Models for Advanced Vision Applications

Semantic segmentation has made significant strides in pixel-level image understanding, yet it remains limited in capturing contextual and semantic relationships between objects. Current models, such as CNN and Transformer-based…

Computer Vision and Pattern Recognition · Computer Science 2025-03-26 Ben Rahman

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Most recent semantic segmentation methods adopt a fully-convolutional network (FCN) with an encoder-decoder architecture. The encoder progressively reduces the spatial resolution and learns more abstract/semantic visual concepts with larger…

Computer Vision and Pattern Recognition · Computer Science 2021-07-27 Sixiao Zheng , Jiachen Lu , Hengshuang Zhao , Xiatian Zhu , Zekun Luo , Yabiao Wang , Yanwei Fu , Jianfeng Feng , Tao Xiang , Philip H. S. Torr , Li Zhang

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation

Although vision transformers (ViTs) have achieved great success in computer vision, the heavy computational cost hampers their applications to dense prediction tasks such as semantic segmentation on mobile devices. In this paper, we present…

Computer Vision and Pattern Recognition · Computer Science 2022-04-13 Wenqiang Zhang , Zilong Huang , Guozhong Luo , Tao Chen , Xinggang Wang , Wenyu Liu , Gang Yu , Chunhua Shen

A Unified Efficient Pyramid Transformer for Semantic Segmentation

Semantic segmentation is a challenging problem due to difficulties in modeling context in complex scenes and class confusions along boundaries. Most literature either focuses on context modeling or boundary refinement, which is less…

Computer Vision and Pattern Recognition · Computer Science 2021-07-30 Fangrui Zhu , Yi Zhu , Li Zhang , Chongruo Wu , Yanwei Fu , Mu Li

IncepFormer: Efficient Inception Transformer with Pyramid Pooling for Semantic Segmentation

Semantic segmentation usually benefits from global contexts, fine localisation information, multi-scale features, etc. To advance Transformer-based segmenters with these aspects, we present a simple yet powerful semantic segmentation…

Computer Vision and Pattern Recognition · Computer Science 2022-12-07 Lihua Fu , Haoyue Tian , Xiangping Bryce Zhai , Pan Gao , Xiaojiang Peng

UniFormer: Unifying Convolution and Self-attention for Visual Recognition

It is a challenging task to learn discriminative representation from images and videos, due to large local redundancy and complex global dependency in these visual data. Convolution neural networks (CNNs) and vision transformers (ViTs) have…

Computer Vision and Pattern Recognition · Computer Science 2024-10-28 Kunchang Li , Yali Wang , Junhao Zhang , Peng Gao , Guanglu Song , Yu Liu , Hongsheng Li , Yu Qiao

MacFormer: Semantic Segmentation with Fine Object Boundaries

Semantic segmentation involves assigning a specific category to each pixel in an image. While Vision Transformer-based models have made significant progress, current semantic segmentation methods often struggle with precise predictions in…

Computer Vision and Pattern Recognition · Computer Science 2024-08-13 Guoan Xu , Wenfeng Huang , Tao Wu , Ligeng Chen , Wenjing Jia , Guangwei Gao , Xiatian Zhu , Stuart Perry

CTNet: Context-based Tandem Network for Semantic Segmentation

Contextual information has been shown to be powerful for semantic segmentation. This work proposes a novel Context-based Tandem Network (CTNet) by interactively exploring the spatial contextual information and the channel contextual…

Computer Vision and Pattern Recognition · Computer Science 2021-04-21 Zechao Li , Yanpeng Sun , Jinhui Tang

UNetFormer: A UNet-like Transformer for Efficient Semantic Segmentation of Remote Sensing Urban Scene Imagery

Semantic segmentation of remotely sensed urban scene images is required in a wide range of practical applications, such as land cover mapping, urban change detection, environmental protection, and economic assessment.Driven by rapid…

Computer Vision and Pattern Recognition · Computer Science 2022-06-28 Libo Wang , Rui Li , Ce Zhang , Shenghui Fang , Chenxi Duan , Xiaoliang Meng , Peter M. Atkinson

TCFormer: Visual Recognition via Token Clustering Transformer

Transformers are widely used in computer vision areas and have achieved remarkable success. Most state-of-the-art approaches split images into regular grids and represent each grid region with a vision token. However, fixed token…

Computer Vision and Pattern Recognition · Computer Science 2024-07-17 Wang Zeng , Sheng Jin , Lumin Xu , Wentao Liu , Chen Qian , Wanli Ouyang , Ping Luo , Xiaogang Wang

HAFormer: Unleashing the Power of Hierarchy-Aware Features for Lightweight Semantic Segmentation

Both Convolutional Neural Networks (CNNs) and Transformers have shown great success in semantic segmentation tasks. Efforts have been made to integrate CNNs with Transformer models to capture both local and global context interactions.…

Computer Vision and Pattern Recognition · Computer Science 2024-07-12 Guoan Xu , Wenjing Jia , Tao Wu , Ligeng Chen , Guangwei Gao

Semantic Labeling of High Resolution Images Using EfficientUNets and Transformers

Semantic segmentation necessitates approaches that learn high-level characteristics while dealing with enormous amounts of data. Convolutional neural networks (CNNs) can learn unique and adaptive features to achieve this aim. However, due…

Computer Vision and Pattern Recognition · Computer Science 2023-07-19 Hasan AlMarzouqi , Lyes Saad Saoud

Efficient Contextformer: Spatio-Channel Window Attention for Fast Context Modeling in Learned Image Compression

Entropy estimation is essential for the performance of learned image compression. It has been demonstrated that a transformer-based entropy model is of critical importance for achieving a high compression ratio, however, at the expense of a…

Image and Video Processing · Electrical Eng. & Systems 2024-02-28 A. Burakhan Koyuncu , Panqi Jia , Atanas Boev , Elena Alshina , Eckehard Steinbach

Context Encoding for Semantic Segmentation

Recent work has made significant progress in improving spatial resolution for pixelwise labeling with Fully Convolutional Network (FCN) framework by employing Dilated/Atrous convolution, utilizing multi-scale features and refining…

Computer Vision and Pattern Recognition · Computer Science 2018-03-26 Hang Zhang , Kristin Dana , Jianping Shi , Zhongyue Zhang , Xiaogang Wang , Ambrish Tyagi , Amit Agrawal

Contextual Vision Transformers for Robust Representation Learning

We introduce Contextual Vision Transformers (ContextViT), a method designed to generate robust image representations for datasets experiencing shifts in latent factors across various groups. Derived from the concept of in-context learning,…

Computer Vision and Pattern Recognition · Computer Science 2023-10-02 Yujia Bao , Theofanis Karaletsos

SERNet-Former: Semantic Segmentation by Efficient Residual Network with Attention-Boosting Gates and Attention-Fusion Networks

Improving the efficiency of state-of-the-art methods in semantic segmentation requires overcoming the increasing computational cost as well as issues such as fusing semantic information from global and local contexts. Based on the recent…

Computer Vision and Pattern Recognition · Computer Science 2026-01-05 Serdar Erisen

MetaSeg: MetaFormer-based Global Contexts-aware Network for Efficient Semantic Segmentation

Beyond the Transformer, it is important to explore how to exploit the capacity of the MetaFormer, an architecture that is fundamental to the performance improvements of the Transformer. Previous studies have exploited it only for the…

Computer Vision and Pattern Recognition · Computer Science 2024-08-16 Beoungwoo Kang , Seunghun Moon , Yubin Cho , Hyunwoo Yu , Suk-Ju Kang

Vision Transformers: From Semantic Segmentation to Dense Prediction

The emergence of vision transformers (ViTs) in image classification has shifted the methodologies for visual representation learning. In particular, ViTs learn visual representation at full receptive field per layer across all the image…

Computer Vision and Pattern Recognition · Computer Science 2024-08-05 Li Zhang , Jiachen Lu , Sixiao Zheng , Xinxuan Zhao , Xiatian Zhu , Yanwei Fu , Tao Xiang , Jianfeng Feng , Philip H. S. Torr

FTCFormer: Fuzzy Token Clustering Transformer for Image Classification

Transformer-based deep neural networks have achieved remarkable success across various computer vision tasks, largely attributed to their long-range self-attention mechanism and scalability. However, most transformer architectures embed…

Computer Vision and Pattern Recognition · Computer Science 2025-07-15 Muyi Bao , Changyu Zeng , Yifan Wang , Zhengni Yang , Zimu Wang , Guangliang Cheng , Jun Qi , Wei Wang