Related papers: BinaryViT: Towards Efficient and Accurate Binary V…

BinaryViT: Pushing Binary Vision Transformers Towards Convolutional Models

With the increasing popularity and the increasing size of vision transformers (ViTs), there has been an increasing interest in making them more efficient and less computationally costly for deployment on edge devices with limited computing…

Computer Vision and Pattern Recognition · Computer Science 2023-07-04 Phuoc-Hoan Charles Le , Xinlin Li

Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer

The large pre-trained vision transformers (ViTs) have demonstrated remarkable performance on various visual tasks, but suffer from expensive computational and memory cost problems when deployed on resource-constrained devices. Among the…

Computer Vision and Pattern Recognition · Computer Science 2022-10-14 Yanjing Li , Sheng Xu , Baochang Zhang , Xianbin Cao , Peng Gao , Guodong Guo

BiViT: Extremely Compressed Binary Vision Transformer

Model binarization can significantly compress model size, reduce energy consumption, and accelerate inference through efficient bit-wise operations. Although binarizing convolutional neural networks have been extensively studied, there is…

Computer Vision and Pattern Recognition · Computer Science 2023-10-06 Yefei He , Zhenyu Lou , Luoming Zhang , Jing Liu , Weijia Wu , Hong Zhou , Bohan Zhuang

Bi-ViT: Pushing the Limit of Vision Transformer Quantization

Vision transformers (ViTs) quantization offers a promising prospect to facilitate deploying large pre-trained networks on resource-limited devices. Fully-binarized ViTs (Bi-ViT) that pushes the quantization of ViTs to its limit remain…

Computer Vision and Pattern Recognition · Computer Science 2023-05-23 Yanjing Li , Sheng Xu , Mingbao Lin , Xianbin Cao , Chuanjian Liu , Xiao Sun , Baochang Zhang

BHViT: Binarized Hybrid Vision Transformer

Model binarization has made significant progress in enabling real-time and energy-efficient computation for convolutional neural networks (CNN), offering a potential solution to the deployment challenges faced by Vision Transformers (ViTs)…

Computer Vision and Pattern Recognition · Computer Science 2025-03-07 Tian Gao , Zhiyuan Zhang , Yu Zhang , Huajun Liu , Kaijie Yin , Chengzhong Xu , Hui Kong

GSB: Group Superposition Binarization for Vision Transformer with Limited Training Samples

Vision Transformer (ViT) has performed remarkably in various computer vision tasks. Nonetheless, affected by the massive amount of parameters, ViT usually suffers from serious overfitting problems with a relatively limited number of…

Computer Vision and Pattern Recognition · Computer Science 2024-01-19 Tian Gao , Cheng-Zhong Xu , Le Zhang , Hui Kong

High-Fidelity Differential-information Driven Binary Vision Transformer

The binarization of vision transformers (ViTs) offers a promising approach to addressing the trade-off between high computational/storage demands and the constraints of edge-device deployment. However, existing binary ViT methods often…

Computer Vision and Pattern Recognition · Computer Science 2025-07-15 Tian Gao , Zhiyuan Zhang , Kaijie Yin , Xu-Cheng Zhong , Hui Kong

TerViT: An Efficient Ternary Vision Transformer

Vision transformers (ViTs) have demonstrated great potential in various visual tasks, but suffer from expensive computational and memory cost problems when deployed on resource-constrained devices. In this paper, we introduce a ternary…

Computer Vision and Pattern Recognition · Computer Science 2022-01-24 Sheng Xu , Yanjing Li , Teli Ma , Bohan Zeng , Baochang Zhang , Peng Gao , Jinhu Lv

Memory-Efficient Vision Transformers: An Activation-Aware Mixed-Rank Compression Strategy

As Vision Transformers (ViTs) increasingly set new benchmarks in computer vision, their practical deployment on inference engines is often hindered by their significant memory bandwidth and (on-chip) memory footprint requirements. This…

Computer Vision and Pattern Recognition · Computer Science 2024-02-12 Seyedarmin Azizi , Mahdi Nazemi , Massoud Pedram

Model Quantization and Hardware Acceleration for Vision Transformers: A Comprehensive Survey

Vision Transformers (ViTs) have recently garnered considerable attention, emerging as a promising alternative to convolutional neural networks (CNNs) in several vision-related applications. However, their large model sizes and high…

Machine Learning · Computer Science 2024-05-02 Dayou Du , Gu Gong , Xiaowen Chu

Boosting Binary Neural Networks via Dynamic Thresholds Learning

Developing lightweight Deep Convolutional Neural Networks (DCNNs) and Vision Transformers (ViTs) has become one of the focuses in vision research since the low computational cost is essential for deploying vision models on edge devices.…

Image and Video Processing · Electrical Eng. & Systems 2022-11-11 Jiehua Zhang , Xueyang Zhang , Zhuo Su , Zitong Yu , Yanghe Feng , Xin Lu , Matti Pietikäinen , Li Liu

MiniViT: Compressing Vision Transformers with Weight Multiplexing

Vision Transformer (ViT) models have recently drawn much attention in computer vision due to their high model capability. However, ViT models suffer from huge number of parameters, restricting their applicability on devices with limited…

Computer Vision and Pattern Recognition · Computer Science 2022-04-15 Jinnian Zhang , Houwen Peng , Kan Wu , Mengchen Liu , Bin Xiao , Jianlong Fu , Lu Yuan

Q-HyViT: Post-Training Quantization of Hybrid Vision Transformers with Bridge Block Reconstruction for IoT Systems

Recently, vision transformers (ViTs) have superseded convolutional neural networks in numerous applications, including classification, detection, and segmentation. However, the high computational requirements of ViTs hinder their widespread…

Computer Vision and Pattern Recognition · Computer Science 2024-05-20 Jemin Lee , Yongin Kwon , Sihyeong Park , Misun Yu , Jeman Park , Hwanjun Song

Comprehensive Survey of Model Compression and Speed up for Vision Transformers

Vision Transformers (ViT) have marked a paradigm shift in computer vision, outperforming state-of-the-art models across diverse tasks. However, their practical deployment is hampered by high computational and memory demands. This study…

Computer Vision and Pattern Recognition · Computer Science 2024-04-17 Feiyang Chen , Ziqian Luo , Lisang Zhou , Xueting Pan , Ying Jiang

Unified Visual Transformer Compression

Vision transformers (ViTs) have gained popularity recently. Even without customized image operators such as convolutions, ViTs can yield competitive performance when properly trained on massive data. However, the computational overhead of…

Machine Learning · Computer Science 2022-03-17 Shixing Yu , Tianlong Chen , Jiayi Shen , Huan Yuan , Jianchao Tan , Sen Yang , Ji Liu , Zhangyang Wang

Vision Transformers on the Edge: A Comprehensive Survey of Model Compression and Acceleration Strategies

In recent years, vision transformers (ViTs) have emerged as powerful and promising techniques for computer vision tasks such as image classification, object detection, and segmentation. Unlike convolutional neural networks (CNNs), which…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Shaibal Saha , Lanyu Xu

Making Vision Transformers Truly Shift-Equivariant

For computer vision, Vision Transformers (ViTs) have become one of the go-to deep net architectures. Despite being inspired by Convolutional Neural Networks (CNNs), ViTs' output remains sensitive to small spatial shifts in the input, i.e.,…

Computer Vision and Pattern Recognition · Computer Science 2023-11-30 Renan A. Rojas-Gomez , Teck-Yian Lim , Minh N. Do , Raymond A. Yeh

BiT: Robustly Binarized Multi-distilled Transformer

Modern pre-trained transformers have rapidly advanced the state-of-the-art in machine learning, but have also grown in parameters and computational complexity, making them increasingly difficult to deploy in resource-constrained…

Machine Learning · Computer Science 2022-10-04 Zechun Liu , Barlas Oguz , Aasish Pappu , Lin Xiao , Scott Yih , Meng Li , Raghuraman Krishnamoorthi , Yashar Mehdad

ViT-1.58b: Mobile Vision Transformers in the 1-bit Era

Vision Transformers (ViTs) have achieved remarkable performance in various image classification tasks by leveraging the attention mechanism to process image patches as tokens. However, the high computational and memory demands of ViTs pose…

Computer Vision and Pattern Recognition · Computer Science 2024-06-27 Zhengqing Yuan , Rong Zhou , Hongyi Wang , Lifang He , Yanfang Ye , Lichao Sun

Trio-ViT: Post-Training Quantization and Acceleration for Softmax-Free Efficient Vision Transformer

Motivated by the huge success of Transformers in the field of natural language processing (NLP), Vision Transformers (ViTs) have been rapidly developed and achieved remarkable performance in various computer vision tasks. However, their…

Computer Vision and Pattern Recognition · Computer Science 2024-10-01 Huihong Shi , Haikuo Shao , Wendong Mao , Zhongfeng Wang