English
Related papers

Related papers: Dynamic Grained Encoder for Vision Transformers

200 papers

Fine-grained recognition involves the classification of images from subordinate macro-categories, and it is challenging due to small inter-class differences. To overcome this, most methods perform discriminative feature selection enabled by…

Computer Vision and Pattern Recognition · Computer Science 2024-07-19 Edwin Arkel Rios , Min-Chun Hu , Bo-Cheng Lai

Visual segmentation seeks to partition images, video frames, or point clouds into multiple segments or groups. This technique has numerous real-world applications, such as autonomous driving, image editing, robot sensing, and medical…

Computer Vision and Pattern Recognition · Computer Science 2024-08-06 Xiangtai Li , Henghui Ding , Haobo Yuan , Wenwei Zhang , Jiangmiao Pang , Guangliang Cheng , Kai Chen , Ziwei Liu , Chen Change Loy

In this paper, we propose a transformer based approach for visual grounding. Unlike previous proposal-and-rank frameworks that rely heavily on pretrained object detectors or proposal-free frameworks that upgrade an off-the-shelf one-stage…

Computer Vision and Pattern Recognition · Computer Science 2022-03-15 Ye Du , Zehua Fu , Qingjie Liu , Yunhong Wang

Transformers are widely used in computer vision areas and have achieved remarkable success. Most state-of-the-art approaches split images into regular grids and represent each grid region with a vision token. However, fixed token…

Computer Vision and Pattern Recognition · Computer Science 2024-07-17 Wang Zeng , Sheng Jin , Lumin Xu , Wentao Liu , Chen Qian , Wanli Ouyang , Ping Luo , Xiaogang Wang

Transformers with powerful global relation modeling abilities have been introduced to fundamental computer vision tasks recently. As a typical example, the Vision Transformer (ViT) directly applies a pure transformer architecture on image…

Computer Vision and Pattern Recognition · Computer Science 2021-08-05 Xiaoyu Yue , Shuyang Sun , Zhanghui Kuang , Meng Wei , Philip Torr , Wayne Zhang , Dahua Lin

We present a novel usage of Transformers to make image classification interpretable. Unlike mainstream classifiers that wait until the last fully connected layer to incorporate class information to make predictions, we investigate a…

Transformers have revolutionized computer vision and natural language processing, but their high computational complexity limits their application in high-resolution image processing and long-context analysis. This paper introduces…

Computer Vision and Pattern Recognition · Computer Science 2025-04-01 Yuchen Duan , Weiyun Wang , Zhe Chen , Xizhou Zhu , Lewei Lu , Tong Lu , Yu Qiao , Hongsheng Li , Jifeng Dai , Wenhai Wang

Uniform downsampling remains the de facto standard for reducing spatial resolution in vision backbones. In this work, we propose an alternative design built around a content-aware spatial grouping layer, that dynamically assigns tokens to a…

Computer Vision and Pattern Recognition · Computer Science 2025-05-23 Guillem Brasó , Aljoša Ošep , Laura Leal-Taixé

Most advanced visual grounding methods rely on Transformers for visual-linguistic feature fusion. However, these Transformer-based approaches encounter a significant drawback: the computational costs escalate quadratically due to the…

Computer Vision and Pattern Recognition · Computer Science 2024-08-05 Wei Chen , Long Chen , Yu Wu

Referring image segmentation is a fundamental vision-language task that aims to segment out an object referred to by a natural language expression from an image. One of the key challenges behind this task is leveraging the referring…

Computer Vision and Pattern Recognition · Computer Science 2022-04-07 Zhao Yang , Jiaqi Wang , Yansong Tang , Kai Chen , Hengshuang Zhao , Philip H. S. Torr

Image Classification is a fundamental task in the field of computer vision that frequently serves as a benchmark for gauging advancements in Computer Vision. Over the past few years, significant progress has been made in image…

Computer Vision and Pattern Recognition · Computer Science 2023-12-06 Mahmoud Khalil , Ahmad Khalil , Alioune Ngom

We present Reversible Vision Transformers, a memory efficient architecture design for visual recognition. By decoupling the GPU memory requirement from the depth of the model, Reversible Vision Transformers enable scaling up architectures…

Computer Vision and Pattern Recognition · Computer Science 2023-02-10 Karttikeya Mangalam , Haoqi Fan , Yanghao Li , Chao-Yuan Wu , Bo Xiong , Christoph Feichtenhofer , Jitendra Malik

We introduce dense vision transformers, an architecture that leverages vision transformers in place of convolutional networks as a backbone for dense prediction tasks. We assemble tokens from various stages of the vision transformer into…

Computer Vision and Pattern Recognition · Computer Science 2021-03-26 René Ranftl , Alexey Bochkovskiy , Vladlen Koltun

Semantic segmentation of remotely sensed urban scene images is required in a wide range of practical applications, such as land cover mapping, urban change detection, environmental protection, and economic assessment.Driven by rapid…

Computer Vision and Pattern Recognition · Computer Science 2022-06-28 Libo Wang , Rui Li , Ce Zhang , Shenghui Fang , Chenxi Duan , Xiaoliang Meng , Peter M. Atkinson

Fine-grained action recognition is a challenging task in computer vision. As fine-grained datasets have small inter-class variations in spatial and temporal space, fine-grained action recognition model requires good temporal reasoning and…

Computer Vision and Pattern Recognition · Computer Science 2022-08-04 Mei Chee Leong , Haosong Zhang , Hui Li Tan , Liyuan Li , Joo Hwee Lim

Transformer, an attention-based encoder-decoder model, has already revolutionized the field of natural language processing (NLP). Inspired by such significant achievements, some pioneering works have recently been done on employing…

Computer Vision and Pattern Recognition · Computer Science 2022-12-07 Yang Liu , Yao Zhang , Yixin Wang , Feng Hou , Jin Yuan , Jiang Tian , Yang Zhang , Zhongchao Shi , Jianping Fan , Zhiqiang He

Vision Transformers (ViTs) have demonstrated strong capabilities in capturing global dependencies but often struggle to efficiently represent fine-grained local details. Existing multi-scale approaches alleviate this issue by integrating…

Computer Vision and Pattern Recognition · Computer Science 2025-11-25 Qiyang Yu , Yu Fang , Tianrui Li , Xuemei Cao , Yan Chen , Jianghao Li , Fan Min

Visual place recognition (VPR) aims to determine the general geographical location of a query image by retrieving visually similar images from a large geo-tagged database. To obtain a global representation for each place image, most…

Computer Vision and Pattern Recognition · Computer Science 2025-05-27 Tong Jin , Feng Lu , Shuyu Hu , Chun Yuan , Yunpeng Liu

Vision transformer based models bring significant improvements for image segmentation tasks. Although these architectures offer powerful capabilities irrespective of specific segmentation tasks, their use of computational resources can be…

Computer Vision and Pattern Recognition · Computer Science 2026-04-01 Manyi Yao , Abhishek Aich , Yumin Suh , Amit Roy-Chowdhury , Christian Shelton , Manmohan Chandraker

Transformers have shown outstanding results for natural language understanding and, more recently, for image classification. We here extend this work and propose a transformer-based approach for image retrieval: we adopt vision transformers…

Computer Vision and Pattern Recognition · Computer Science 2021-02-11 Alaaeldin El-Nouby , Natalia Neverova , Ivan Laptev , Hervé Jégou
‹ Prev 1 2 3 10 Next ›