Related papers: ViM-VQ: Efficient Post-Training Vector Quantizatio…

Post-Training Quantization for Vision Mamba with k-Scaled Quantization and Reparameterization

The Mamba model, utilizing a structured state-space model (SSM), offers linear time complexity and demonstrates significant potential. Vision Mamba (ViM) extends this framework to vision tasks by incorporating a bidirectional SSM and patch…

Image and Video Processing · Electrical Eng. & Systems 2025-02-14 Bo-Yun Shi , Yi-Cheng Lo , An-Yeu , Wu , Yi-Min Tsai

PTQ4VM: Post-Training Quantization for Visual Mamba

Visual Mamba is an approach that extends the selective space state model, Mamba, to vision tasks. It processes image tokens sequentially in a fixed order, accumulating information to generate outputs. Despite its growing popularity for…

Computer Vision and Pattern Recognition · Computer Science 2025-04-08 Younghyun Cho , Changhun Lee , Seonggon Kim , Eunhyeok Park

ViM-Q: Scalable Algorithm-Hardware Co-Design for Vision Mamba Model Inference on FPGA

Vision Mamba (ViM) models offer a compelling efficiency advantage over Transformers by leveraging the linear complexity of State Space Models (SSMs), yet efficiently deploying them on FPGAs remains challenging. Linear layers struggle with…

Hardware Architecture · Computer Science 2026-05-05 Shengzhe Lyu , Yuhan She , Patrick S. Y. Hung , Ray C. C. Cheung , Weitao Xu

MGVQ: Synergizing Multi-dimensional Sensitivity-Aware and Gradient-Hessian Fusion for Vector Quantization

Vision-Language Models (VLMs) achieve outstanding performance, yet their huge model size severely hinders deployment on edge devices with limited resources. As an efficient model compression technique, vector quantization (VQ) excels in…

Computer Vision and Pattern Recognition · Computer Science 2026-05-26 Zhong Wang , Zukang Xu , Xing Hu , Dawei Yang

VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models

Scaling model size significantly challenges the deployment and inference of Large Language Models (LLMs). Due to the redundancy in LLM weights, recent research has focused on pushing weight-only quantization to extremely low-bit (even down…

Artificial Intelligence · Computer Science 2024-10-23 Yifei Liu , Jicheng Wen , Yang Wang , Shengyu Ye , Li Lyna Zhang , Ting Cao , Cheng Li , Mao Yang

Famba-V: Fast Vision Mamba with Cross-Layer Token Fusion

Mamba and Vision Mamba (Vim) models have shown their potential as an alternative to methods based on Transformer architecture. This work introduces Fast Mamba for Vision (Famba-V), a cross-layer token fusion technique to enhance the…

Computer Vision and Pattern Recognition · Computer Science 2024-10-08 Hui Shen , Zhongwei Wan , Xin Wang , Mi Zhang

MambaQuant: Quantizing the Mamba Family with Variance Aligned Rotation Methods

Mamba is an efficient sequence model that rivals Transformers and demonstrates significant potential as a foundational architecture for various tasks. Quantization is commonly used in neural networks to reduce model size and computational…

Machine Learning · Computer Science 2025-03-12 Zukang Xu , Yuxuan Yue , Xing Hu , Zhihang Yuan , Zixu Jiang , Zhixuan Chen , Jiangyong Yu , Chen Xu , Sifan Zhou , Dawei Yang

VCMamba: Bridging Convolutions with Multi-Directional Mamba for Efficient Visual Representation

Recent advances in Vision Transformers (ViTs) and State Space Models (SSMs) have challenged the dominance of Convolutional Neural Networks (CNNs) in computer vision. ViTs excel at capturing global context, and SSMs like Mamba offer linear…

Computer Vision and Pattern Recognition · Computer Science 2025-09-08 Mustafa Munir , Alex Zhang , Radu Marculescu

FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation

Post-training quantization (PTQ) has stood out as a cost-effective and promising model compression paradigm in recent years, as it avoids computationally intensive model retraining. Nevertheless, current PTQ methods for Vision Transformers…

Computer Vision and Pattern Recognition · Computer Science 2025-06-16 Zhuguanyu Wu , Shihe Wang , Jiayi Zhang , Jiaxin Chen , Yunhong Wang

Q-VLM: Post-training Quantization for Large Vision-Language Models

In this paper, we propose a post-training quantization framework of large vision-language models (LVLMs) for efficient multi-modal inference. Conventional quantization methods sequentially search the layer-wise rounding functions by…

Computer Vision and Pattern Recognition · Computer Science 2025-02-25 Changyuan Wang , Ziwei Wang , Xiuwei Xu , Yansong Tang , Jie Zhou , Jiwen Lu

Vim-F: Visual State Space Model Benefiting from Learning in the Frequency Domain

In recent years, State Space Models (SSMs) with efficient hardware-aware designs, known as the Mamba deep learning models, have made significant progress in modeling long sequences such as language understanding. Therefore, building…

Computer Vision and Pattern Recognition · Computer Science 2025-09-26 Juntao Zhang , Shaogeng Liu , Jun Zhou , Kun Bian , You Zhou , Jianning Liu , Pei Zhang , Bingyan Liu

Vivim: a Video Vision Mamba for Medical Video Segmentation

Medical video segmentation gains increasing attention in clinical practice due to the redundant dynamic references in video frames. However, traditional convolutional neural networks have a limited receptive field and transformer-based…

Computer Vision and Pattern Recognition · Computer Science 2024-08-02 Yijun Yang , Zhaohu Xing , Lequan Yu , Chunwang Huang , Huazhu Fu , Lei Zhu

BVI-Mamba: Video Enhancement Using a Visual State-Space Model for Low-Light and Underwater Environments

Videos captured in low-light and underwater conditions often suffer from distortions such as noise, low contrast, color imbalance, and blur. These issues not only limit visibility but also degrade automatic tasks like detection.…

Computer Vision and Pattern Recognition · Computer Science 2026-04-28 Guoxi Huang , Ruirui Lin , Yini Li , David R. Bull , Nantheera Anantrasirichai

MVQA: Mamba with Unified Sampling for Efficient Video Quality Assessment

The rapid growth of long-duration, high-definition videos has made efficient video quality assessment (VQA) a critical challenge. Existing research typically tackles this problem through two main strategies: reducing model parameters and…

Computer Vision and Pattern Recognition · Computer Science 2025-04-23 Yachun Mi , Yu Li , Weicheng Meng , Chaofeng Chen , Chen Hui , Shaohui Liu

Selective Visual Prompting in Vision Mamba

Pre-trained Vision Mamba (Vim) models have demonstrated exceptional performance across various computer vision tasks in a computationally efficient manner, attributed to their unique design of selective state space models. To further extend…

Computer Vision and Pattern Recognition · Computer Science 2024-12-13 Yifeng Yao , Zichen Liu , Zhenyu Cui , Yuxin Peng , Jiahuan Zhou

Q-HyViT: Post-Training Quantization of Hybrid Vision Transformers with Bridge Block Reconstruction for IoT Systems

Recently, vision transformers (ViTs) have superseded convolutional neural networks in numerous applications, including classification, detection, and segmentation. However, the high computational requirements of ViTs hinder their widespread…

Computer Vision and Pattern Recognition · Computer Science 2024-05-20 Jemin Lee , Yongin Kwon , Sihyeong Park , Misun Yu , Jeman Park , Hwanjun Song

LocalMamba: Visual State Space Model with Windowed Selective Scan

Recent advancements in state space models, notably Mamba, have demonstrated significant progress in modeling long sequences for tasks like language understanding. Yet, their application in vision tasks has not markedly surpassed the…

Computer Vision and Pattern Recognition · Computer Science 2024-03-15 Tao Huang , Xiaohuan Pei , Shan You , Fei Wang , Chen Qian , Chang Xu

VSRM: A Robust Mamba-Based Framework for Video Super-Resolution

Video super-resolution remains a major challenge in low-level vision tasks. To date, CNN- and Transformer-based methods have delivered impressive results. However, CNNs are limited by local receptive fields, while Transformers struggle with…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Dinh Phu Tran , Dao Duy Hung , Daeyoung Kim

VLMQ: Token Saliency-Driven Post-Training Quantization for Vision-language Models

Post-training quantization (PTQ) has emerged as an effective technique for compressing large models and accelerating inference without retraining. While PTQ has been extensively studied in large language models (LLMs), its application to…

Computer Vision and Pattern Recognition · Computer Science 2026-03-09 Yufei Xue , Yushi Huang , Jiawei Shao , Lunjie Zhu , Chi Zhang , Xuelong Li , Jun Zhang

Mamba-X: An End-to-End Vision Mamba Accelerator for Edge Computing Devices

Transformers have proven effective in language modeling but are limited by high computational and memory demands that grow quadratically with input sequence length. State space models (SSMs) offer a promising alternative by reducing…

Hardware Architecture · Computer Science 2025-08-06 Dongho Yoon , Gungyu Lee , Jaewon Chang , Yunjae Lee , Dongjae Lee , Minsoo Rhu