Related papers: Dynamic Vision Mamba

Fast Vision Mamba: Pooling Spatial Dimensions for Accelerated Processing

State Space Models (SSMs) with selective scan (Mamba) have been adapted into efficient vision models. Mamba, unlike Vision Transformers, achieves linear complexity for token interactions through a recurrent hidden state process. This…

Computer Vision and Pattern Recognition · Computer Science 2025-02-04 Saarthak Kapse , Robin Betz , Srinivasan Sivanandan

LocalMamba: Visual State Space Model with Windowed Selective Scan

Recent advancements in state space models, notably Mamba, have demonstrated significant progress in modeling long sequences for tasks like language understanding. Yet, their application in vision tasks has not markedly surpassed the…

Computer Vision and Pattern Recognition · Computer Science 2024-03-15 Tao Huang , Xiaohuan Pei , Shan You , Fei Wang , Chen Qian , Chang Xu

MambaScope: Coarse-to-Fine Scoping for Efficient Vision Mamba

Vision Mamba has emerged as a promising and efficient alternative to Vision Transformers, yet its efficiency remains fundamentally constrained by the number of input tokens. Existing token reduction approaches typically adopt token pruning…

Computer Vision and Pattern Recognition · Computer Science 2025-12-04 Shanhui Liu , Rui Xu , Yunke Wang

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Recently the state space models (SSMs) with efficient hardware-aware designs, i.e., the Mamba deep learning model, have shown great potential for long sequence modeling. Meanwhile building efficient and generic vision backbones purely upon…

Computer Vision and Pattern Recognition · Computer Science 2024-11-15 Lianghui Zhu , Bencheng Liao , Qian Zhang , Xinlong Wang , Wenyu Liu , Xinggang Wang

Faster Vision Mamba is Rebuilt in Minutes via Merged Token Re-training

Vision Mamba has shown close to state of the art performance on computer vision tasks, drawing much interest in increasing it's efficiency. A promising approach is token reduction (that has been successfully implemented in ViTs). Pruning…

Computer Vision and Pattern Recognition · Computer Science 2025-04-15 Mingjia Shi , Yuhao Zhou , Ruiji Yu , Zekai Li , Zhiyuan Liang , Xuanlei Zhao , Xiaojiang Peng , Shanmukha Ramakrishna Vedantam , Wangbo Zhao , Kai Wang , Yang You

EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba

Prior efforts in light-weight model development mainly centered on CNN and Transformer-based designs yet faced persistent challenges. CNNs adept at local feature extraction compromise resolution while Transformers offer global reach but…

Computer Vision and Pattern Recognition · Computer Science 2024-03-18 Xiaohuan Pei , Tao Huang , Chang Xu

FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba

Multimodal image fusion aims to integrate information from different imaging techniques to produce a comprehensive, detail-rich single image for downstream vision tasks. Existing methods based on local convolutional neural networks (CNNs)…

Computer Vision and Pattern Recognition · Computer Science 2025-02-04 Xinyu Xie , Yawen Cui , Tao Tan , Xubin Zheng , Zitong Yu

DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis

Diffusion models have achieved great success in image generation, with the backbone evolving from U-Net to Vision Transformers. However, the computational cost of Transformers is quadratic to the number of tokens, leading to significant…

Computer Vision and Pattern Recognition · Computer Science 2024-07-11 Yao Teng , Yue Wu , Han Shi , Xuefei Ning , Guohao Dai , Yu Wang , Zhenguo Li , Xihui Liu

Training-free Token Reduction for Vision Mamba

Vision Mamba has emerged as a strong competitor to Vision Transformers (ViTs) due to its ability to efficiently capture long-range dependencies with linear computational complexity. While token reduction, an effective compression technique…

Computer Vision and Pattern Recognition · Computer Science 2025-07-21 Qiankun Ma , Ziyao Zhang , Chi Su , Jie Chen , Zhen Song , Hairong Zheng , Wen Gao

DefMamba: Deformable Visual State Space Model

Recently, state space models (SSM), particularly Mamba, have attracted significant attention from scholars due to their ability to effectively balance computational efficiency and performance. However, most existing visual Mamba methods…

Computer Vision and Pattern Recognition · Computer Science 2025-04-09 Leiye Liu , Miao Zhang , Jihao Yin , Tingwei Liu , Wei Ji , Yongri Piao , Huchuan Lu

Vision SmolMamba: Spike-Guided Token Pruning for Energy-Efficient Spiking State-Space Vision Models

Spiking Transformers have shown strong potential for long-range visual modeling through spike-driven self-attention. However, their quadratic token interactions remain fundamentally misaligned with the sparse and event-driven nature of…

Computer Vision and Pattern Recognition · Computer Science 2026-04-29 Dewei Bai , Hongxiang Peng , Yunyun Zeng , Ziyu Zhang , Hong Qu , Yi Zhang

LoG-VMamba: Local-Global Vision Mamba for Medical Image Segmentation

Mamba, a State Space Model (SSM), has recently shown competitive performance to Convolutional Neural Networks (CNNs) and Transformers in Natural Language Processing and general sequence modeling. Various attempts have been made to adapt…

Computer Vision and Pattern Recognition · Computer Science 2024-08-27 Trung Dinh Quoc Dang , Huy Hoang Nguyen , Aleksei Tiulpin

MVSMamba: Multi-View Stereo with State Space Model

Robust feature representations are essential for learning-based Multi-View Stereo (MVS), which relies on accurate feature matching. Recent MVS methods leverage Transformers to capture long-range dependencies based on local features…

Computer Vision and Pattern Recognition · Computer Science 2025-11-04 Jianfei Jiang , Qiankun Liu , Hongyuan Liu , Haochen Yu , Liyong Wang , Jiansheng Chen , Huimin Ma

LocoMamba: Vision-Driven Locomotion via End-to-End Deep Reinforcement Learning with Mamba

We introduce LocoMamba, a vision-driven cross-modal DRL framework built on selective state-space models, specifically leveraging Mamba, that achieves near-linear-time sequence modeling, effectively captures long-range dependencies, and…

Robotics · Computer Science 2025-12-16 Yinuo Wang , Gavin Tao

MFil-Mamba: Multi-Filter Scanning for Spatial Redundancy-Aware Visual State Space Models

State Space Models (SSMs), especially recent Mamba architecture, have achieved remarkable success in sequence modeling tasks. However, extending SSMs to computer vision remains challenging due to the non-sequential structure of visual data…

Computer Vision and Pattern Recognition · Computer Science 2026-03-23 Puskal Khadka , KC Santosh

Scaling Diffusion Mamba with Bidirectional SSMs for Efficient Image and Video Generation

In recent developments, the Mamba architecture, known for its selective state space approach, has shown potential in the efficient modeling of long sequences. However, its application in image generation remains underexplored. Traditional…

Computer Vision and Pattern Recognition · Computer Science 2024-05-28 Shentong Mo , Yapeng Tian

SF-Mamba: Rethinking State Space Model for Vision

The realm of Mamba for vision has been advanced in recent years to strike for the alternatives of Vision Transformers (ViTs) that suffer from the quadratic complexity. While the recurrent scanning mechanism of Mamba offers computational…

Computer Vision and Pattern Recognition · Computer Science 2026-03-18 Masakazu Yoshimura , Teruaki Hayashi , Yuki Hoshino , Wei-Yao Wang , Takeshi Ohashi

Famba-V: Fast Vision Mamba with Cross-Layer Token Fusion

Mamba and Vision Mamba (Vim) models have shown their potential as an alternative to methods based on Transformer architecture. This work introduces Fast Mamba for Vision (Famba-V), a cross-layer token fusion technique to enhance the…

Computer Vision and Pattern Recognition · Computer Science 2024-10-08 Hui Shen , Zhongwei Wan , Xin Wang , Mi Zhang

DVMSR: Distillated Vision Mamba for Efficient Super-Resolution

Efficient Image Super-Resolution (SR) aims to accelerate SR network inference by minimizing computational complexity and network parameters while preserving performance. Existing state-of-the-art Efficient Image Super-Resolution methods are…

Image and Video Processing · Electrical Eng. & Systems 2024-05-14 Xiaoyan Lei , Wenlong Zhang , Weifeng Cao

DiMSUM: Diffusion Mamba -- A Scalable and Unified Spatial-Frequency Method for Image Generation

We introduce a novel state-space architecture for diffusion models, effectively harnessing spatial and frequency information to enhance the inductive bias towards local features in input images for image generation tasks. While state-space…

Computer Vision and Pattern Recognition · Computer Science 2025-04-14 Hao Phung , Quan Dao , Trung Dao , Hoang Phan , Dimitris Metaxas , Anh Tran