Related papers: PatchDropout: Economizing Vision Transformers Usin…

Patch Slimming for Efficient Vision Transformers

This paper studies the efficiency problem for visual transformers by excavating redundant calculation in given networks. The recent transformer architecture has demonstrated its effectiveness for achieving excellent performance on a series…

Computer Vision and Pattern Recognition · Computer Science 2022-04-05 Yehui Tang , Kai Han , Yunhe Wang , Chang Xu , Jianyuan Guo , Chao Xu , Dacheng Tao

Compress image to patches for Vision Transformer

The Vision Transformer (ViT) has made significant strides in the field of computer vision. However, as the depth of the model and the resolution of the input images increase, the computational cost associated with training and running ViT…

Computer Vision and Pattern Recognition · Computer Science 2025-02-18 Xinfeng Zhao , Yaoru Sun

Super Vision Transformer

We attempt to reduce the computational costs in vision transformers (ViTs), which increase quadratically in the token number. We present a novel training paradigm that trains only one ViT model at a time, but is capable of providing…

Computer Vision and Pattern Recognition · Computer Science 2023-07-20 Mingbao Lin , Mengzhao Chen , Yuxin Zhang , Chunhua Shen , Rongrong Ji , Liujuan Cao

CP-ViT: Cascade Vision Transformer Pruning via Progressive Sparsity Prediction

Vision transformer (ViT) has achieved competitive accuracy on a variety of computer vision applications, but its computational cost impedes the deployment on resource-limited mobile devices. We explore the sparsity in ViT and observe that…

Computer Vision and Pattern Recognition · Computer Science 2022-03-10 Zhuoran Song , Yihong Xu , Zhezhi He , Li Jiang , Naifeng Jing , Xiaoyao Liang

Three things everyone should know about Vision Transformers

After their initial success in natural language processing, transformer architectures have rapidly gained traction in computer vision, providing state-of-the-art results for tasks such as image classification, detection, segmentation, and…

Computer Vision and Pattern Recognition · Computer Science 2022-03-21 Hugo Touvron , Matthieu Cord , Alaaeldin El-Nouby , Jakob Verbeek , Hervé Jégou

AdaViT: Adaptive Vision Transformers for Efficient Image Recognition

Built on top of self-attention mechanisms, vision transformers have demonstrated remarkable performance on a variety of vision tasks recently. While achieving excellent performance, they still require relatively intensive computational cost…

Computer Vision and Pattern Recognition · Computer Science 2021-12-01 Lingchen Meng , Hengduo Li , Bor-Chun Chen , Shiyi Lan , Zuxuan Wu , Yu-Gang Jiang , Ser-Nam Lim

Effective Vision Transformer Training: A Data-Centric Perspective

Vision Transformers (ViTs) have shown promising performance compared with Convolutional Neural Networks (CNNs), but the training of ViTs is much harder than CNNs. In this paper, we define several metrics, including Dynamic Data Proportion…

Computer Vision and Pattern Recognition · Computer Science 2022-09-30 Benjia Zhou , Pichao Wang , Jun Wan , Yanyan Liang , Fan Wang

ViR:the Vision Reservoir

The most recent year has witnessed the success of applying the Vision Transformer (ViT) for image classification. However, there are still evidences indicating that ViT often suffers following two aspects, i) the high computation and the…

Computer Vision and Pattern Recognition · Computer Science 2021-12-30 Xian Wei , Bin Wang , Mingsong Chen , Ji Yuan , Hai Lan , Jiehuang Shi , Xuan Tang , Bo Jin , Guozhang Chen , Dongping Yang

FlexiViT: One Model for All Patch Sizes

Vision Transformers convert images to sequences by slicing them into patches. The size of these patches controls a speed/accuracy tradeoff, with smaller patches leading to higher accuracy at greater computational cost, but changing the…

Computer Vision and Pattern Recognition · Computer Science 2023-03-27 Lucas Beyer , Pavel Izmailov , Alexander Kolesnikov , Mathilde Caron , Simon Kornblith , Xiaohua Zhai , Matthias Minderer , Michael Tschannen , Ibrahim Alabdulmohsin , Filip Pavetic

FocusedDropout for Convolutional Neural Network

In convolutional neural network (CNN), dropout cannot work well because dropped information is not entirely obscured in convolutional layers where features are correlated spatially. Except randomly discarding regions or channels, many…

Computer Vision and Pattern Recognition · Computer Science 2021-03-30 Tianshu Xie , Minghui Liu , Jiali Deng , Xuan Cheng , Xiaomin Wang , Ming Liu

Flip-Rotate-Pooling Convolution and Split Dropout on Convolution Neural Networks for Image Classification

This paper presents a new version of Dropout called Split Dropout (sDropout) and rotational convolution techniques to improve CNNs' performance on image classification. The widely used standard Dropout has advantage of preventing deep…

Computer Vision and Pattern Recognition · Computer Science 2015-08-03 Fa Wu , Peijun Hu , Dexing Kong

Effect of Patch Size on Fine-Tuning Vision Transformers in Two-Dimensional and Three-Dimensional Medical Image Classification

Vision Transformers (ViTs) and their variants have become state-of-the-art in many computer vision tasks and are widely used as backbones in large-scale vision and vision-language foundation models. While substantial research has focused on…

Computer Vision and Pattern Recognition · Computer Science 2026-02-24 Massoud Dehghan , Ramona Woitek , Amirreza Mahbod

MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution

Although Vision Transformers (ViTs) have recently advanced computer vision tasks significantly, an important real-world problem was overlooked: adapting to variable input resolutions. Typically, images are resized to a fixed resolution,…

Computer Vision and Pattern Recognition · Computer Science 2024-05-29 Wenzhuo Liu , Fei Zhu , Shijie Ma , Cheng-Lin Liu

PatchRot: A Self-Supervised Technique for Training Vision Transformers

Vision transformers require a huge amount of labeled data to outperform convolutional neural networks. However, labeling a huge dataset is a very expensive process. Self-supervised learning techniques alleviate this problem by learning…

Computer Vision and Pattern Recognition · Computer Science 2022-10-31 Sachin Chhabra , Prabal Bijoy Dutta , Hemanth Venkateswara , Baoxin Li

Dropout Prompt Learning: Towards Robust and Adaptive Vision-Language Models

Dropout is a widely used regularization technique which improves the generalization ability of a model by randomly dropping neurons. In light of this, we propose Dropout Prompt Learning, which aims for applying dropout to improve the…

Computer Vision and Pattern Recognition · Computer Science 2025-12-09 Biao Chen , Lin Zuo , Mengmeng Jing , Kunbin He , Yuchen Wang

CF-ViT: A General Coarse-to-Fine Method for Vision Transformer

Vision Transformers (ViT) have made many breakthroughs in computer vision tasks. However, considerable redundancy arises in the spatial dimension of an input image, leading to massive computational costs. Therefore, We propose a…

Computer Vision and Pattern Recognition · Computer Science 2022-11-22 Mengzhao Chen , Mingbao Lin , Ke Li , Yunhang Shen , Yongjian Wu , Fei Chao , Rongrong Ji

Accelerating Vision Transformers with Adaptive Patch Sizes

Vision Transformers (ViTs) partition input images into uniformly sized patches regardless of their content, resulting in long input sequence lengths for high-resolution images. We present Adaptive Patch Transformers (APT), which addresses…

Computer Vision and Pattern Recognition · Computer Science 2026-04-24 Rohan Choudhury , JungEun Kim , Jinhyung Park , Eunho Yang , László A. Jeni , Kris M. Kitani

RAViT: Resolution-Adaptive Vision Transformer

Vision transformers have recently made a breakthrough in computer vision showing excellent performance in terms of precision for numerous applications. However, their computational cost is very high compared to alternative approaches such…

Computer Vision and Pattern Recognition · Computer Science 2026-03-02 Martial Guidez , Stefan Duffner , Christophe Garcia

Optimizing Vision Transformers with Data-Free Knowledge Transfer

The groundbreaking performance of transformers in Natural Language Processing (NLP) tasks has led to their replacement of traditional Convolutional Neural Networks (CNNs), owing to the efficiency and accuracy achieved through the…

Computer Vision and Pattern Recognition · Computer Science 2024-08-13 Gousia Habib , Damandeep Singh , Ishfaq Ahmad Malik , Brejesh Lall

A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking

Vision Transformer (ViT) architectures are becoming increasingly popular and widely employed to tackle computer vision applications. Their main feature is the capacity to extract global information through the self-attention mechanism,…

Computer Vision and Pattern Recognition · Computer Science 2024-05-06 Lorenzo Papa , Paolo Russo , Irene Amerini , Luping Zhou