Related papers: Efficiency 360: Efficient Vision Transformers

Transformers in Vision: A Survey

Astounding results from Transformer models on natural language tasks have intrigued the vision community to study their application to computer vision problems. Among their salient benefits, Transformers enable modeling long dependencies…

Computer Vision and Pattern Recognition · Computer Science 2022-01-20 Salman Khan , Muzammal Naseer , Munawar Hayat , Syed Waqas Zamir , Fahad Shahbaz Khan , Mubarak Shah

A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking

Vision Transformer (ViT) architectures are becoming increasingly popular and widely employed to tackle computer vision applications. Their main feature is the capacity to extract global information through the self-attention mechanism,…

Computer Vision and Pattern Recognition · Computer Science 2024-05-06 Lorenzo Papa , Paolo Russo , Irene Amerini , Luping Zhou

Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers

Self-attention in Transformers comes with a high computational cost because of their quadratic computational complexity, but their effectiveness in addressing problems in language and vision has sparked extensive research aimed at enhancing…

Computer Vision and Pattern Recognition · Computer Science 2025-02-25 Tobias Christian Nauen , Sebastian Palacio , Federico Raue , Andreas Dengel

Three things everyone should know about Vision Transformers

After their initial success in natural language processing, transformer architectures have rapidly gained traction in computer vision, providing state-of-the-art results for tasks such as image classification, detection, segmentation, and…

Computer Vision and Pattern Recognition · Computer Science 2022-03-21 Hugo Touvron , Matthieu Cord , Alaaeldin El-Nouby , Jakob Verbeek , Hervé Jégou

Vision Transformers: State of the Art and Research Challenges

Transformers have achieved great success in natural language processing. Due to the powerful capability of self-attention mechanism in transformers, researchers develop the vision transformers for a variety of computer vision tasks, such as…

Computer Vision and Pattern Recognition · Computer Science 2022-07-08 Bo-Kai Ruan , Hong-Han Shuai , Wen-Huang Cheng

A Comprehensive Study of Vision Transformers in Image Classification Tasks

Image Classification is a fundamental task in the field of computer vision that frequently serves as a benchmark for gauging advancements in Computer Vision. Over the past few years, significant progress has been made in image…

Computer Vision and Pattern Recognition · Computer Science 2023-12-06 Mahmoud Khalil , Ahmad Khalil , Alioune Ngom

Improving the Efficiency of Transformers for Resource-Constrained Devices

Transformers provide promising accuracy and have become popular and used in various domains such as natural language processing and computer vision. However, due to their massive number of model parameters, memory and computation…

Machine Learning · Computer Science 2021-07-01 Hamid Tabani , Ajay Balasubramaniam , Shabbir Marzban , Elahe Arani , Bahram Zonooz

Image Recognition with Online Lightweight Vision Transformer: A Survey

The Transformer architecture has achieved significant success in natural language processing, motivating its adaptation to computer vision tasks. Unlike convolutional neural networks, vision transformers inherently capture long-range…

Computer Vision and Pattern Recognition · Computer Science 2025-09-29 Zherui Zhang , Rongtao Xu , Jie Zhou , Changwei Wang , Xingtian Pei , Wenhao Xu , Jiguang Zhang , Li Guo , Longxiang Gao , Wenbo Xu , Shibiao Xu

A Survey on Visual Transformer

Transformer, first applied to the field of natural language processing, is a type of deep neural network mainly based on the self-attention mechanism. Thanks to its strong representation capabilities, researchers are looking at ways to…

Computer Vision and Pattern Recognition · Computer Science 2023-07-11 Kai Han , Yunhe Wang , Hanting Chen , Xinghao Chen , Jianyuan Guo , Zhenhua Liu , Yehui Tang , An Xiao , Chunjing Xu , Yixing Xu , Zhaohui Yang , Yiman Zhang , Dacheng Tao

Explainability of Vision Transformers: A Comprehensive Review and New Perspectives

Transformers have had a significant impact on natural language processing and have recently demonstrated their potential in computer vision. They have shown promising results over convolution neural networks in fundamental computer vision…

Computer Vision and Pattern Recognition · Computer Science 2023-11-14 Rojina Kashefi , Leili Barekatain , Mohammad Sabokrou , Fatemeh Aghaeipoor

Reversible Vision Transformers

We present Reversible Vision Transformers, a memory efficient architecture design for visual recognition. By decoupling the GPU memory requirement from the depth of the model, Reversible Vision Transformers enable scaling up architectures…

Computer Vision and Pattern Recognition · Computer Science 2023-02-10 Karttikeya Mangalam , Haoqi Fan , Yanghao Li , Chao-Yuan Wu , Bo Xiong , Christoph Feichtenhofer , Jitendra Malik

Efficient Transformers: A Survey

Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning. In the field of natural language processing for example,…

Machine Learning · Computer Science 2022-03-15 Yi Tay , Mostafa Dehghani , Dara Bahri , Donald Metzler

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional…

Computer Vision and Pattern Recognition · Computer Science 2021-06-04 Alexey Dosovitskiy , Lucas Beyer , Alexander Kolesnikov , Dirk Weissenborn , Xiaohua Zhai , Thomas Unterthiner , Mostafa Dehghani , Matthias Minderer , Georg Heigold , Sylvain Gelly , Jakob Uszkoreit , Neil Houlsby

Adventures of Trustworthy Vision-Language Models: A Survey

Recently, transformers have become incredibly popular in computer vision and vision-language tasks. This notable rise in their usage can be primarily attributed to the capabilities offered by attention mechanisms and the outstanding ability…

Computer Vision and Pattern Recognition · Computer Science 2023-12-08 Mayank Vatsa , Anubhooti Jain , Richa Singh

Vision Transformer Finetuning Benefits from Non-Smooth Components

The smoothness of the transformer architecture has been extensively studied in the context of generalization, training stability, and adversarial robustness. However, its role in transfer learning remains poorly understood. In this paper,…

Machine Learning · Computer Science 2026-02-10 Ambroise Odonnat , Laetitia Chapel , Romain Tavenard , Ievgen Redko

Vision Transformer Pruning

Vision transformer has achieved competitive performance on a variety of computer vision applications. However, their storage, run-time memory, and computational demands are hindering the deployment to mobile devices. Here we present a…

Computer Vision and Pattern Recognition · Computer Science 2021-08-17 Mingjian Zhu , Yehui Tang , Kai Han

3D Vision with Transformers: A Survey

The success of the transformer architecture in natural language processing has recently triggered attention in the computer vision field. The transformer has been used as a replacement for the widely used convolution operators, due to its…

Computer Vision and Pattern Recognition · Computer Science 2022-08-09 Jean Lahoud , Jiale Cao , Fahad Shahbaz Khan , Hisham Cholakkal , Rao Muhammad Anwer , Salman Khan , Ming-Hsuan Yang

Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers

Vision Transformers achieve impressive accuracy across a range of visual recognition tasks. Unfortunately, their accuracy frequently comes with high computational costs. This is a particular issue in video recognition, where models are…

Computer Vision and Pattern Recognition · Computer Science 2023-08-28 Matthew Dutson , Yin Li , Mohit Gupta

Vision Transformer Computation and Resilience for Dynamic Inference

State-of-the-art deep learning models for computer vision tasks are based on the transformer architecture and often deployed in real-time applications. In this scenario, the resources available for every inference can vary, so it is useful…

Computer Vision and Pattern Recognition · Computer Science 2024-04-17 Kavya Sreedhar , Jason Clemons , Rangharajan Venkatesan , Stephen W. Keckler , Mark Horowitz

Machine Learning for Brain Disorders: Transformers and Visual Transformers

Transformers were initially introduced for natural language processing (NLP) tasks, but fast they were adopted by most deep learning fields, including computer vision. They measure the relationships between pairs of input tokens (words in…

Computer Vision and Pattern Recognition · Computer Science 2023-03-22 Robin Courant , Maika Edberg , Nicolas Dufour , Vicky Kalogeiton