English
Related papers

Related papers: PatchDropout: Economizing Vision Transformers Usin…

200 papers

This paper studies the efficiency problem for visual transformers by excavating redundant calculation in given networks. The recent transformer architecture has demonstrated its effectiveness for achieving excellent performance on a series…

Computer Vision and Pattern Recognition · Computer Science 2022-04-05 Yehui Tang , Kai Han , Yunhe Wang , Chang Xu , Jianyuan Guo , Chao Xu , Dacheng Tao

The Vision Transformer (ViT) has made significant strides in the field of computer vision. However, as the depth of the model and the resolution of the input images increase, the computational cost associated with training and running ViT…

Computer Vision and Pattern Recognition · Computer Science 2025-02-18 Xinfeng Zhao , Yaoru Sun

We attempt to reduce the computational costs in vision transformers (ViTs), which increase quadratically in the token number. We present a novel training paradigm that trains only one ViT model at a time, but is capable of providing…

Computer Vision and Pattern Recognition · Computer Science 2023-07-20 Mingbao Lin , Mengzhao Chen , Yuxin Zhang , Chunhua Shen , Rongrong Ji , Liujuan Cao

Vision transformer (ViT) has achieved competitive accuracy on a variety of computer vision applications, but its computational cost impedes the deployment on resource-limited mobile devices. We explore the sparsity in ViT and observe that…

Computer Vision and Pattern Recognition · Computer Science 2022-03-10 Zhuoran Song , Yihong Xu , Zhezhi He , Li Jiang , Naifeng Jing , Xiaoyao Liang

After their initial success in natural language processing, transformer architectures have rapidly gained traction in computer vision, providing state-of-the-art results for tasks such as image classification, detection, segmentation, and…

Computer Vision and Pattern Recognition · Computer Science 2022-03-21 Hugo Touvron , Matthieu Cord , Alaaeldin El-Nouby , Jakob Verbeek , Hervé Jégou

Built on top of self-attention mechanisms, vision transformers have demonstrated remarkable performance on a variety of vision tasks recently. While achieving excellent performance, they still require relatively intensive computational cost…

Computer Vision and Pattern Recognition · Computer Science 2021-12-01 Lingchen Meng , Hengduo Li , Bor-Chun Chen , Shiyi Lan , Zuxuan Wu , Yu-Gang Jiang , Ser-Nam Lim

Vision Transformers (ViTs) have shown promising performance compared with Convolutional Neural Networks (CNNs), but the training of ViTs is much harder than CNNs. In this paper, we define several metrics, including Dynamic Data Proportion…

Computer Vision and Pattern Recognition · Computer Science 2022-09-30 Benjia Zhou , Pichao Wang , Jun Wan , Yanyan Liang , Fan Wang

The most recent year has witnessed the success of applying the Vision Transformer (ViT) for image classification. However, there are still evidences indicating that ViT often suffers following two aspects, i) the high computation and the…

Computer Vision and Pattern Recognition · Computer Science 2021-12-30 Xian Wei , Bin Wang , Mingsong Chen , Ji Yuan , Hai Lan , Jiehuang Shi , Xuan Tang , Bo Jin , Guozhang Chen , Dongping Yang

Vision Transformers convert images to sequences by slicing them into patches. The size of these patches controls a speed/accuracy tradeoff, with smaller patches leading to higher accuracy at greater computational cost, but changing the…

In convolutional neural network (CNN), dropout cannot work well because dropped information is not entirely obscured in convolutional layers where features are correlated spatially. Except randomly discarding regions or channels, many…

Computer Vision and Pattern Recognition · Computer Science 2021-03-30 Tianshu Xie , Minghui Liu , Jiali Deng , Xuan Cheng , Xiaomin Wang , Ming Liu

This paper presents a new version of Dropout called Split Dropout (sDropout) and rotational convolution techniques to improve CNNs' performance on image classification. The widely used standard Dropout has advantage of preventing deep…

Computer Vision and Pattern Recognition · Computer Science 2015-08-03 Fa Wu , Peijun Hu , Dexing Kong

Vision Transformers (ViTs) and their variants have become state-of-the-art in many computer vision tasks and are widely used as backbones in large-scale vision and vision-language foundation models. While substantial research has focused on…

Computer Vision and Pattern Recognition · Computer Science 2026-02-24 Massoud Dehghan , Ramona Woitek , Amirreza Mahbod

Although Vision Transformers (ViTs) have recently advanced computer vision tasks significantly, an important real-world problem was overlooked: adapting to variable input resolutions. Typically, images are resized to a fixed resolution,…

Computer Vision and Pattern Recognition · Computer Science 2024-05-29 Wenzhuo Liu , Fei Zhu , Shijie Ma , Cheng-Lin Liu

Vision transformers require a huge amount of labeled data to outperform convolutional neural networks. However, labeling a huge dataset is a very expensive process. Self-supervised learning techniques alleviate this problem by learning…

Computer Vision and Pattern Recognition · Computer Science 2022-10-31 Sachin Chhabra , Prabal Bijoy Dutta , Hemanth Venkateswara , Baoxin Li

Dropout is a widely used regularization technique which improves the generalization ability of a model by randomly dropping neurons. In light of this, we propose Dropout Prompt Learning, which aims for applying dropout to improve the…

Computer Vision and Pattern Recognition · Computer Science 2025-12-09 Biao Chen , Lin Zuo , Mengmeng Jing , Kunbin He , Yuchen Wang

Vision Transformers (ViT) have made many breakthroughs in computer vision tasks. However, considerable redundancy arises in the spatial dimension of an input image, leading to massive computational costs. Therefore, We propose a…

Computer Vision and Pattern Recognition · Computer Science 2022-11-22 Mengzhao Chen , Mingbao Lin , Ke Li , Yunhang Shen , Yongjian Wu , Fei Chao , Rongrong Ji

Vision Transformers (ViTs) partition input images into uniformly sized patches regardless of their content, resulting in long input sequence lengths for high-resolution images. We present Adaptive Patch Transformers (APT), which addresses…

Computer Vision and Pattern Recognition · Computer Science 2026-04-24 Rohan Choudhury , JungEun Kim , Jinhyung Park , Eunho Yang , László A. Jeni , Kris M. Kitani

Vision transformers have recently made a breakthrough in computer vision showing excellent performance in terms of precision for numerous applications. However, their computational cost is very high compared to alternative approaches such…

Computer Vision and Pattern Recognition · Computer Science 2026-03-02 Martial Guidez , Stefan Duffner , Christophe Garcia

The groundbreaking performance of transformers in Natural Language Processing (NLP) tasks has led to their replacement of traditional Convolutional Neural Networks (CNNs), owing to the efficiency and accuracy achieved through the…

Computer Vision and Pattern Recognition · Computer Science 2024-08-13 Gousia Habib , Damandeep Singh , Ishfaq Ahmad Malik , Brejesh Lall

Vision Transformer (ViT) architectures are becoming increasingly popular and widely employed to tackle computer vision applications. Their main feature is the capacity to extract global information through the self-attention mechanism,…

Computer Vision and Pattern Recognition · Computer Science 2024-05-06 Lorenzo Papa , Paolo Russo , Irene Amerini , Luping Zhou
‹ Prev 1 2 3 10 Next ›