Related papers: Patch Slimming for Efficient Vision Transformers

Life Regression based Patch Slimming for Vision Transformers

Vision transformers have achieved remarkable success in computer vision tasks by using multi-head self-attention modules to capture long-range dependencies within images. However, the high inference computation cost poses a new challenge.…

Computer Vision and Pattern Recognition · Computer Science 2023-04-12 Jiawei Chen , Lin Chen , Jiang Yang , Tianqi Shi , Lechao Cheng , Zunlei Feng , Mingli Song

PatchDropout: Economizing Vision Transformers Using Patch Dropout

Vision transformers have demonstrated the potential to outperform CNNs in a variety of vision tasks. But the computational and memory requirements of these models prohibit their use in many applications, especially those that depend on…

Computer Vision and Pattern Recognition · Computer Science 2022-10-06 Yue Liu , Christos Matsoukas , Fredrik Strand , Hossein Azizpour , Kevin Smith

CF-ViT: A General Coarse-to-Fine Method for Vision Transformer

Vision Transformers (ViT) have made many breakthroughs in computer vision tasks. However, considerable redundancy arises in the spatial dimension of an input image, leading to massive computational costs. Therefore, We propose a…

Computer Vision and Pattern Recognition · Computer Science 2022-11-22 Mengzhao Chen , Mingbao Lin , Ke Li , Yunhang Shen , Yongjian Wu , Fei Chao , Rongrong Ji

Multi-Dimensional Model Compression of Vision Transformer

Vision transformers (ViT) have recently attracted considerable attentions, but the huge computational cost remains an issue for practical deployment. Previous ViT pruning methods tend to prune the model along one dimension solely, which may…

Computer Vision and Pattern Recognition · Computer Science 2022-01-04 Zejiang Hou , Sun-Yuan Kung

Super Vision Transformer

We attempt to reduce the computational costs in vision transformers (ViTs), which increase quadratically in the token number. We present a novel training paradigm that trains only one ViT model at a time, but is capable of providing…

Computer Vision and Pattern Recognition · Computer Science 2023-07-20 Mingbao Lin , Mengzhao Chen , Yuxin Zhang , Chunhua Shen , Rongrong Ji , Liujuan Cao

Three things everyone should know about Vision Transformers

After their initial success in natural language processing, transformer architectures have rapidly gained traction in computer vision, providing state-of-the-art results for tasks such as image classification, detection, segmentation, and…

Computer Vision and Pattern Recognition · Computer Science 2022-03-21 Hugo Touvron , Matthieu Cord , Alaaeldin El-Nouby , Jakob Verbeek , Hervé Jégou

Efficient Vision Transformer for Human Pose Estimation via Patch Selection

While Convolutional Neural Networks (CNNs) have been widely successful in 2D human pose estimation, Vision Transformers (ViTs) have emerged as a promising alternative to CNNs, boosting state-of-the-art performance. However, the quadratic…

Computer Vision and Pattern Recognition · Computer Science 2023-11-23 Kaleab A. Kinfu , Rene Vidal

CP-ViT: Cascade Vision Transformer Pruning via Progressive Sparsity Prediction

Vision transformer (ViT) has achieved competitive accuracy on a variety of computer vision applications, but its computational cost impedes the deployment on resource-limited mobile devices. We explore the sparsity in ViT and observe that…

Computer Vision and Pattern Recognition · Computer Science 2022-03-10 Zhuoran Song , Yihong Xu , Zhezhi He , Li Jiang , Naifeng Jing , Xiaoyao Liang

Compress image to patches for Vision Transformer

The Vision Transformer (ViT) has made significant strides in the field of computer vision. However, as the depth of the model and the resolution of the input images increase, the computational cost associated with training and running ViT…

Computer Vision and Pattern Recognition · Computer Science 2025-02-18 Xinfeng Zhao , Yaoru Sun

Patch Pruning Strategy Based on Robust Statistical Measures of Attention Weight Diversity in Vision Transformers

Multi-head self-attention is a distinctive feature extraction mechanism of vision transformers that computes pairwise relationships among all input patches, contributing significantly to their high performance. However, it is known to incur…

Computer Vision and Pattern Recognition · Computer Science 2025-07-28 Yuki Igaue , Hiroaki Aizawa

A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking

Vision Transformer (ViT) architectures are becoming increasingly popular and widely employed to tackle computer vision applications. Their main feature is the capacity to extract global information through the self-attention mechanism,…

Computer Vision and Pattern Recognition · Computer Science 2024-05-06 Lorenzo Papa , Paolo Russo , Irene Amerini , Luping Zhou

Vision Transformers on the Edge: A Comprehensive Survey of Model Compression and Acceleration Strategies

In recent years, vision transformers (ViTs) have emerged as powerful and promising techniques for computer vision tasks such as image classification, object detection, and segmentation. Unlike convolutional neural networks (CNNs), which…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Shaibal Saha , Lanyu Xu

Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space

This paper explores the feasibility of finding an optimal sub-model from a vision transformer and introduces a pure vision transformer slimming (ViT-Slim) framework. It can search a sub-structure from the original model end-to-end across…

Computer Vision and Pattern Recognition · Computer Science 2022-04-26 Arnav Chavan , Zhiqiang Shen , Zhuang Liu , Zechun Liu , Kwang-Ting Cheng , Eric Xing

AdaViT: Adaptive Vision Transformers for Efficient Image Recognition

Built on top of self-attention mechanisms, vision transformers have demonstrated remarkable performance on a variety of vision tasks recently. While achieving excellent performance, they still require relatively intensive computational cost…

Computer Vision and Pattern Recognition · Computer Science 2021-12-01 Lingchen Meng , Hengduo Li , Bor-Chun Chen , Shiyi Lan , Zuxuan Wu , Yu-Gang Jiang , Ser-Nam Lim

COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models

Attention-based vision models, such as Vision Transformer (ViT) and its variants, have shown promising performance in various computer vision tasks. However, these emerging architectures suffer from large model sizes and high computational…

Computer Vision and Pattern Recognition · Computer Science 2024-12-04 Jinqi Xiao , Miao Yin , Yu Gong , Xiao Zang , Jian Ren , Bo Yuan

Rethinking Vision Transformer Depth via Structural Reparameterization

The computational overhead of Vision Transformers in practice stems fundamentally from their deep architectures, yet existing acceleration strategies have primarily targeted algorithmic-level optimizations such as token pruning and…

Computer Vision and Pattern Recognition · Computer Science 2025-11-26 Chengwei Zhou , Vipin Chaudhary , Gourav Datta

Effect of Patch Size on Fine-Tuning Vision Transformers in Two-Dimensional and Three-Dimensional Medical Image Classification

Vision Transformers (ViTs) and their variants have become state-of-the-art in many computer vision tasks and are widely used as backbones in large-scale vision and vision-language foundation models. While substantial research has focused on…

Computer Vision and Pattern Recognition · Computer Science 2026-02-24 Massoud Dehghan , Ramona Woitek , Amirreza Mahbod

Learning Efficient Convolutional Networks through Network Slimming

The deployment of deep convolutional neural networks (CNNs) in many real world applications is largely hindered by their high computational cost. In this paper, we propose a novel learning scheme for CNNs to simultaneously 1) reduce the…

Computer Vision and Pattern Recognition · Computer Science 2017-08-23 Zhuang Liu , Jianguo Li , Zhiqiang Shen , Gao Huang , Shoumeng Yan , Changshui Zhang

PPT: Token Pruning and Pooling for Efficient Vision Transformers

Vision Transformers (ViTs) have emerged as powerful models in the field of computer vision, delivering superior performance across various vision tasks. However, the high computational complexity poses a significant barrier to their…

Computer Vision and Pattern Recognition · Computer Science 2024-02-06 Xinjian Wu , Fanhu Zeng , Xiudong Wang , Xinghao Chen

Efficient Partitioning Vision Transformer on Edge Devices for Distributed Inference

Deep learning models are increasingly utilized on resource-constrained edge devices for real-time data analytics. Recently, Vision Transformer and their variants have shown exceptional performance in various computer vision tasks. However,…

Computer Vision and Pattern Recognition · Computer Science 2025-05-22 Xiang Liu , Yijun Song , Xia Li , Yifei Sun , Huiying Lan , Zemin Liu , Linshan Jiang , Jialin Li