English
Related papers

Related papers: Token Pruning in Audio Transformers: Optimizing Pe…

200 papers

Vision Transformers (ViTs) have shown impressive performance in computer vision, but their high computational cost, quadratic in the number of tokens, limits their adoption in computation-constrained applications. However, this large number…

Computer Vision and Pattern Recognition · Computer Science 2023-12-14 Yifei Liu , Mathias Gehrig , Nico Messikommer , Marco Cannici , Davide Scaramuzza

Since the introduction of the Vision Transformer (ViT), researchers have sought to make ViTs more efficient by removing redundant information in the processed tokens. While different methods have been explored to achieve this goal, we still…

Computer Vision and Pattern Recognition · Computer Science 2023-08-10 Joakim Bruslund Haurum , Sergio Escalera , Graham W. Taylor , Thomas B. Moeslund

The adoption of Vision Transformers (ViTs) in resource-constrained applications necessitates improvements in inference throughput. To this end several token pruning and merging approaches have been proposed that improve efficiency by…

Computer Vision and Pattern Recognition · Computer Science 2024-12-03 Benjamin Bergner , Christoph Lippert , Aravindh Mahendran

Token compression is essential for reducing the computational and memory requirements of transformer models, enabling their deployment in resource-constrained environments. In this work, we propose an efficient and hardware-compatible token…

Computer Vision and Pattern Recognition · Computer Science 2025-04-01 Junzhu Mao , Yang Shen , Jinyang Guo , Yazhou Yao , Xiansheng Hua

Since its inception, Vision Transformer (ViT) has emerged as a prevalent model in the computer vision domain. Nonetheless, the multi-head self-attention (MHSA) mechanism in ViT is computationally expensive due to its calculation of…

Computer Vision and Pattern Recognition · Computer Science 2023-07-25 Zhe Bian , Zhe Wang , Wenqiang Han , Kangping Wang

Vision-Language Models (VLMs) demand substantial computational resources during inference, largely due to the extensive visual input tokens for representing visual information. Previous studies have noted that visual tokens tend to receive…

Computer Vision and Pattern Recognition · Computer Science 2025-04-01 Cheng Yang , Yang Sui , Jinqi Xiao , Lingyi Huang , Yu Gong , Chendi Li , Jinghua Yan , Yu Bai , Ponnuswamy Sadayappan , Xia Hu , Bo Yuan

Vision Transformers (ViTs) have achieved remarkable success across various vision tasks, yet their deployment is often hindered by prohibitive computational costs. While structured weight pruning and token compression have emerged as…

Computer Vision and Pattern Recognition · Computer Science 2026-02-19 Hyunchan Moon , Cheonjun Park , Steven L. Waslander

State Space Models (SSMs) have the advantage of keeping linear computational complexity compared to attention modules in transformers, and have been applied to vision tasks as a new type of powerful vision foundation model. Inspired by the…

Computer Vision and Pattern Recognition · Computer Science 2024-09-30 Zheng Zhan , Zhenglun Kong , Yifan Gong , Yushu Wu , Zichong Meng , Hangyu Zheng , Xuan Shen , Stratis Ioannidis , Wei Niu , Pu Zhao , Yanzhi Wang

Token compression techniques have recently emerged as powerful tools for accelerating Vision Transformer (ViT) inference in computer vision. Due to the quadratic computational complexity with respect to the token sequence length, these…

Computer Vision and Pattern Recognition · Computer Science 2025-07-15 Phat Nguyen , Ngai-Man Cheung

Although vision transformers (ViTs) have shown promising results in various computer vision tasks recently, their high computational cost limits their practical applications. Previous approaches that prune redundant tokens have demonstrated…

Computer Vision and Pattern Recognition · Computer Science 2023-04-24 Siyuan Wei , Tianzhu Ye , Shen Zhang , Yao Tang , Jiajun Liang

Vision Transformer (ViT) has achieved impressive results across various vision tasks, yet its high computational cost limits practical applications. Recent methods have aimed to reduce ViT's $O(n^2)$ complexity by pruning unimportant…

Computer Vision and Pattern Recognition · Computer Science 2025-07-17 Yi-Kuan Hsieh , Jun-Wei Hsieh , Xin Li , Yu-Ming Chang , Yu-Chee Tseng

Recent audio-language models have shown impressive performance across a wide range of audio tasks and are increasingly capable of handling long audio inputs. However, the computing costs in these models heavily depend on sequence length,…

Vision transformer has emerged as a new paradigm in computer vision, showing excellent performance while accompanied by expensive computational cost. Image token pruning is one of the main approaches for ViT compression, due to the facts…

Computer Vision and Pattern Recognition · Computer Science 2023-07-07 Xiangcheng Liu , Tianyi Wu , Guodong Guo

Recently, Vision Transformer (ViT) has continuously established new milestones in the computer vision field, while the high computation and memory cost makes its propagation in industrial production difficult. Pruning, a traditional model…

Computer Vision and Pattern Recognition · Computer Science 2022-09-22 Zhenglun Kong , Peiyan Dong , Xiaolong Ma , Xin Meng , Mengshu Sun , Wei Niu , Xuan Shen , Geng Yuan , Bin Ren , Minghai Qin , Hao Tang , Yanzhi Wang

Vision Transformers (ViTs) have emerged as powerful models in the field of computer vision, delivering superior performance across various vision tasks. However, the high computational complexity poses a significant barrier to their…

Computer Vision and Pattern Recognition · Computer Science 2024-02-06 Xinjian Wu , Fanhu Zeng , Xiudong Wang , Xinghao Chen

Vision Transformers (ViTs) have achieved state-of-the-art accuracy on various computer vision tasks. However, their high computational complexity prevents them from being applied to many real-world applications. Weight and token pruning are…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-15 Dhruv Parikh , Shouyi Li , Bingyi Zhang , Rajgopal Kannan , Carl Busart , Viktor Prasanna

Vision Transformers (ViTs) have emerged as the backbone of many segmentation models, consistently achieving state-of-the-art (SOTA) performance. However, their success comes at a significant computational cost. Image token pruning is one of…

Computer Vision and Pattern Recognition · Computer Science 2024-12-02 Hanning Chen , Yang Ni , Wenjun Huang , Yezi Liu , SungHeon Jeong , Fei Wen , Nathaniel Bastian , Hugo Latapie , Mohsen Imani

Contrastive image-text pre-trained models such as CLIP have shown remarkable adaptability to downstream tasks. However, they face challenges due to the high computational requirements of the Vision Transformer (ViT) backbone. Current…

Computer Vision and Pattern Recognition · Computer Science 2024-12-02 Cheng-En Wu , Jinhong Lin , Yu Hen Hu , Pedro Morgado

Despite the success of transformers on various computer vision tasks, they suffer from excessive memory and computational cost. Some works present dynamic vision transformers to accelerate inference by pruning redundant tokens. A key to…

Computer Vision and Pattern Recognition · Computer Science 2023-10-27 Fengyuan Shi , Limin Wang

Vision Transformers (ViTs) deliver state-of-the-art accuracy but their quadratic attention cost and redundant computations severely hinder deployment on latency and resource-constrained platforms. Existing pruning approaches treat either…

Computer Vision and Pattern Recognition · Computer Science 2025-12-24 Mohammad Helal Uddin , Liam Seymour , Sabur Baidya
‹ Prev 1 2 3 10 Next ›