Related papers: I-ViT: Integer-only Quantization for Efficient Vis…

I-Segmenter: Integer-Only Vision Transformer for Efficient Semantic Segmentation

Vision Transformers (ViTs) have recently achieved strong results in semantic segmentation, yet their deployment on resource-constrained devices remains limited due to their high memory footprint and computational cost. Quantization offers…

Computer Vision and Pattern Recognition · Computer Science 2025-09-15 Jordan Sassoon , Michal Szczepanski , Martyna Poreba

I-BERT: Integer-only BERT Quantization

Transformer based models, like BERT and RoBERTa, have achieved state-of-the-art results in many Natural Language Processing tasks. However, their memory footprint, inference latency, and power consumption are prohibitive efficient inference…

Computation and Language · Computer Science 2022-05-02 Sehoon Kim , Amir Gholami , Zhewei Yao , Michael W. Mahoney , Kurt Keutzer

Model Quantization and Hardware Acceleration for Vision Transformers: A Comprehensive Survey

Vision Transformers (ViTs) have recently garnered considerable attention, emerging as a promising alternative to convolutional neural networks (CNNs) in several vision-related applications. However, their large model sizes and high…

Machine Learning · Computer Science 2024-05-02 Dayou Du , Gu Gong , Xiaowen Chu

IPTQ-ViT: Post-Training Quantization of Non-linear Functions for Integer-only Vision Transformers

Previous Quantization-Aware Training (QAT) methods for vision transformers rely on expensive retraining to recover accuracy loss in non-linear layer quantization, limiting their use in resource-constrained environments. In contrast,…

Computer Vision and Pattern Recognition · Computer Science 2025-11-20 Gihwan Kim , Jemin Lee , Hyungshin Kim

Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer

The large pre-trained vision transformers (ViTs) have demonstrated remarkable performance on various visual tasks, but suffer from expensive computational and memory cost problems when deployed on resource-constrained devices. Among the…

Computer Vision and Pattern Recognition · Computer Science 2022-10-14 Yanjing Li , Sheng Xu , Baochang Zhang , Xianbin Cao , Peng Gao , Guodong Guo

Q-ViT: Fully Differentiable Quantization for Vision Transformer

In this paper, we propose a fully differentiable quantization method for vision transformer (ViT) named as Q-ViT, in which both of the quantization scales and bit-widths are learnable parameters. Specifically, based on our observation that…

Computer Vision and Pattern Recognition · Computer Science 2022-09-07 Zhexin Li , Tong Yang , Peisong Wang , Jian Cheng

Trio-ViT: Post-Training Quantization and Acceleration for Softmax-Free Efficient Vision Transformer

Motivated by the huge success of Transformers in the field of natural language processing (NLP), Vision Transformers (ViTs) have been rapidly developed and achieved remarkable performance in various computer vision tasks. However, their…

Computer Vision and Pattern Recognition · Computer Science 2024-10-01 Huihong Shi , Haikuo Shao , Wendong Mao , Zhongfeng Wang

ME-ViT: A Single-Load Memory-Efficient FPGA Accelerator for Vision Transformers

Vision Transformers (ViTs) have emerged as a state-of-the-art solution for object classification tasks. However, their computational demands and high parameter count make them unsuitable for real-time inference, prompting the need for…

Image and Video Processing · Electrical Eng. & Systems 2024-02-16 Kyle Marino , Pengmiao Zhang , Viktor Prasanna

Low-Bit Integerization of Vision Transformers using Operand Reordering for Efficient Hardware

Pre-trained vision transformers have achieved remarkable performance across various visual tasks but suffer from expensive computational and memory costs. While model quantization reduces memory usage by lowering precision, these models…

Machine Learning · Computer Science 2025-08-06 Ching-Yi Lin , Sahil Shah

Token Turing Machines are Efficient Vision Models

We propose Vision Token Turing Machines (ViTTM), an efficient, low-latency, memory-augmented Vision Transformer (ViT). Our approach builds on Neural Turing Machines and Token Turing Machines, which were applied to NLP and sequential visual…

Computer Vision and Pattern Recognition · Computer Science 2025-01-27 Purvish Jajal , Nick John Eliopoulos , Benjamin Shiue-Hal Chou , George K. Thiruvathukal , James C. Davis , Yung-Hsiang Lu

Patch-wise Mixed-Precision Quantization of Vision Transformer

As emerging hardware begins to support mixed bit-width arithmetic computation, mixed-precision quantization is widely used to reduce the complexity of neural networks. However, Vision Transformers (ViTs) require complex self-attention…

Computer Vision and Pattern Recognition · Computer Science 2023-05-12 Junrui Xiao , Zhikai Li , Lianwei Yang , Qingyi Gu

Q-HyViT: Post-Training Quantization of Hybrid Vision Transformers with Bridge Block Reconstruction for IoT Systems

Recently, vision transformers (ViTs) have superseded convolutional neural networks in numerous applications, including classification, detection, and segmentation. However, the high computational requirements of ViTs hinder their widespread…

Computer Vision and Pattern Recognition · Computer Science 2024-05-20 Jemin Lee , Yongin Kwon , Sihyeong Park , Misun Yu , Jeman Park , Hwanjun Song

P$^2$-ViT: Power-of-Two Post-Training Quantization and Acceleration for Fully Quantized Vision Transformer

Vision Transformers (ViTs) have excelled in computer vision tasks but are memory-consuming and computation-intensive, challenging their deployment on resource-constrained devices. To tackle this limitation, prior works have explored…

Artificial Intelligence · Computer Science 2024-05-31 Huihong Shi , Xin Cheng , Wendong Mao , Zhongfeng Wang

M$^2$-ViT: Accelerating Hybrid Vision Transformers with Two-Level Mixed Quantization

Although Vision Transformers (ViTs) have achieved significant success, their intensive computations and substantial memory overheads challenge their deployment on edge devices. To address this, efficient ViTs have emerged, typically…

Hardware Architecture · Computer Science 2024-10-15 Yanbiao Liang , Huihong Shi , Zhongfeng Wang

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be…

Machine Learning · Computer Science 2017-12-19 Benoit Jacob , Skirmantas Kligys , Bo Chen , Menglong Zhu , Matthew Tang , Andrew Howard , Hartwig Adam , Dmitry Kalenichenko

Interpretability-Aware Vision Transformer

Vision Transformers (ViTs) have become prominent models for solving various vision tasks. However, the interpretability of ViTs has not kept pace with their promising performance. While there has been a surge of interest in developing {\it…

Computer Vision and Pattern Recognition · Computer Science 2025-05-02 Yao Qiang , Chengyin Li , Prashant Khanduri , Dongxiao Zhu

FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer

Network quantization significantly reduces model inference complexity and has been widely used in real-world deployments. However, most existing quantization methods have been developed mainly on Convolutional Neural Networks (CNNs), and…

Computer Vision and Pattern Recognition · Computer Science 2023-02-20 Yang Lin , Tianyu Zhang , Peiqin Sun , Zheng Li , Shuchang Zhou

AIQViT: Architecture-Informed Post-Training Quantization for Vision Transformers

Post-training quantization (PTQ) has emerged as a promising solution for reducing the storage and computational cost of vision transformers (ViTs). Recent advances primarily target at crafting quantizers to deal with peculiar activations…

Computer Vision and Pattern Recognition · Computer Science 2025-02-10 Runqing Jiang , Ye Zhang , Longguang Wang , Pengpeng Yu , Yulan Guo

QSViT: A Methodology for Quantizing Spiking Vision Transformers

Vision Transformer (ViT)-based models have shown state-of-the-art performance (e.g., accuracy) in vision-based AI tasks. However, realizing their capability in resource-constrained embedded AI systems is challenging due to their inherent…

Neural and Evolutionary Computing · Computer Science 2026-01-06 Rachmad Vidya Wicaksana Putra , Saad Iftikhar , Muhammad Shafique

ViT-1.58b: Mobile Vision Transformers in the 1-bit Era

Vision Transformers (ViTs) have achieved remarkable performance in various image classification tasks by leveraging the attention mechanism to process image patches as tokens. However, the high computational and memory demands of ViTs pose…

Computer Vision and Pattern Recognition · Computer Science 2024-06-27 Zhengqing Yuan , Rong Zhou , Hongyi Wang , Lifang He , Yanfang Ye , Lichao Sun