English
Related papers

Related papers: Accelerating Vision Transformers with Adaptive Pat…

200 papers

Vision Transformers (ViTs) and their variants have become state-of-the-art in many computer vision tasks and are widely used as backbones in large-scale vision and vision-language foundation models. While substantial research has focused on…

Computer Vision and Pattern Recognition · Computer Science 2026-02-24 Massoud Dehghan , Ramona Woitek , Amirreza Mahbod

Vision Transformers (ViTs) have emerged as the state-of-the-art architecture in representation learning, leveraging self-attention mechanisms to excel in various tasks. ViTs split images into fixed-size patches, constraining them to a…

Computer Vision and Pattern Recognition · Computer Science 2026-02-17 Aswathi Varma , Suprosanna Shit , Chinmay Prabhakar , Daniel Scholz , Hongwei Bran Li , Bjoern Menze , Daniel Rueckert , Benedikt Wiestler

We introduce A-ViT, a method that adaptively adjusts the inference cost of vision transformer (ViT) for images of different complexity. A-ViT achieves this by automatically reducing the number of tokens in vision transformers that are…

Computer Vision and Pattern Recognition · Computer Science 2022-10-10 Hongxu Yin , Arash Vahdat , Jose Alvarez , Arun Mallya , Jan Kautz , Pavlo Molchanov

The ubiquitous and demonstrably suboptimal choice of resizing images to a fixed resolution before processing them with computer vision models has not yet been successfully challenged. However, models such as the Vision Transformer (ViT)…

Built on top of self-attention mechanisms, vision transformers have demonstrated remarkable performance on a variety of vision tasks recently. While achieving excellent performance, they still require relatively intensive computational cost…

Computer Vision and Pattern Recognition · Computer Science 2021-12-01 Lingchen Meng , Hengduo Li , Bor-Chun Chen , Shiyi Lan , Zuxuan Wu , Yu-Gang Jiang , Ser-Nam Lim

We investigate the robustness of vision transformers (ViTs) through the lens of their special patch-based architectural structure, i.e., they process an image as a sequence of image patches. We find that ViTs are surprisingly insensitive to…

Machine Learning · Computer Science 2023-02-23 Yao Qin , Chiyuan Zhang , Ting Chen , Balaji Lakshminarayanan , Alex Beutel , Xuezhi Wang

Vision Transformers convert images to sequences by slicing them into patches. The size of these patches controls a speed/accuracy tradeoff, with smaller patches leading to higher accuracy at greater computational cost, but changing the…

Vision transformers (ViT) have demonstrated impressive performance across various machine vision problems. These models are based on multi-head self-attention mechanisms that can flexibly attend to a sequence of image patches to encode…

Computer Vision and Pattern Recognition · Computer Science 2021-11-29 Muzammal Naseer , Kanchana Ranasinghe , Salman Khan , Munawar Hayat , Fahad Shahbaz Khan , Ming-Hsuan Yang

Vision Transformers (ViTs) take all the image patches as tokens and construct multi-head self-attention (MHSA) among them. Complete leverage of these image tokens brings redundant computations since not all the tokens are attentive in MHSA.…

Computer Vision and Pattern Recognition · Computer Science 2022-04-15 Youwei Liang , Chongjian Ge , Zhan Tong , Yibing Song , Jue Wang , Pengtao Xie

Although Vision Transformers (ViTs) have recently advanced computer vision tasks significantly, an important real-world problem was overlooked: adapting to variable input resolutions. Typically, images are resized to a fixed resolution,…

Computer Vision and Pattern Recognition · Computer Science 2024-05-29 Wenzhuo Liu , Fei Zhu , Shijie Ma , Cheng-Lin Liu

In recent years, vision transformers (ViTs) have emerged as powerful and promising techniques for computer vision tasks such as image classification, object detection, and segmentation. Unlike convolutional neural networks (CNNs), which…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Shaibal Saha , Lanyu Xu

Pretraining Vision Transformers (ViTs) has achieved great success in visual recognition. A following scenario is to adapt a ViT to various image and video recognition tasks. The adaptation is challenging because of heavy computation and…

Computer Vision and Pattern Recognition · Computer Science 2022-10-18 Shoufa Chen , Chongjian Ge , Zhan Tong , Jiangliu Wang , Yibing Song , Jue Wang , Ping Luo

Recently, the Vision Transformer (ViT), which applied the transformer structure to the image classification task, has outperformed convolutional neural networks. However, the high performance of the ViT results from pre-training using a…

Computer Vision and Pattern Recognition · Computer Science 2021-12-28 Seung Hoon Lee , Seunghyun Lee , Byung Cheol Song

We introduce the notion of a Patch Sampling Schedule (PSS), that varies the number of Vision Transformer (ViT) patches used per batch during training. Since all patches are not equally important for most vision objectives (e.g.,…

Computer Vision and Pattern Recognition · Computer Science 2022-08-23 Bradley McDanel , Chi Phuong Huynh

The vision transformer is a model that breaks down each image into a sequence of tokens with a fixed length and processes them similarly to words in natural language processing. Although increasing the number of tokens typically results in…

Machine Learning · Computer Science 2023-07-06 Qiqi Zhou , Yichen Zhu

Vision Transformers (ViT) have made many breakthroughs in computer vision tasks. However, considerable redundancy arises in the spatial dimension of an input image, leading to massive computational costs. Therefore, We propose a…

Computer Vision and Pattern Recognition · Computer Science 2022-11-22 Mengzhao Chen , Mingbao Lin , Ke Li , Yunhang Shen , Yongjian Wu , Fei Chao , Rongrong Ji

While Convolutional Neural Networks (CNNs) have been widely successful in 2D human pose estimation, Vision Transformers (ViTs) have emerged as a promising alternative to CNNs, boosting state-of-the-art performance. However, the quadratic…

Computer Vision and Pattern Recognition · Computer Science 2023-11-23 Kaleab A. Kinfu , Rene Vidal

Recently, foundation models based on Vision Transformers (ViTs) have become widely available. However, their fine-tuning process is highly resource-intensive, and it hinders their adoption in several edge or low-energy applications. To this…

Computer Vision and Pattern Recognition · Computer Science 2024-08-19 Alessio Devoto , Federico Alvetreti , Jary Pomponi , Paolo Di Lorenzo , Pasquale Minervini , Simone Scardapane

The vision transformer splits each image into a sequence of tokens with fixed length and processes the tokens in the same way as words in natural language processing. More tokens normally lead to better performance but considerably…

Computer Vision and Pattern Recognition · Computer Science 2021-12-07 Yichen Zhu , Yuqin Zhu , Jie Du , Yi Wang , Zhicai Ou , Feifei Feng , Jian Tang

The Vision Transformer (ViT) has made significant strides in the field of computer vision. However, as the depth of the model and the resolution of the input images increase, the computational cost associated with training and running ViT…

Computer Vision and Pattern Recognition · Computer Science 2025-02-18 Xinfeng Zhao , Yaoru Sun
‹ Prev 1 2 3 10 Next ›