Related papers: Dimba: Transformer-Mamba Diffusion Models

DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis

Diffusion models have achieved great success in image generation, with the backbone evolving from U-Net to Vision Transformers. However, the computational cost of Transformers is quadratic to the number of tokens, leading to significant…

Computer Vision and Pattern Recognition · Computer Science 2024-07-11 Yao Teng , Yue Wu , Han Shi , Xuefei Ning , Guohao Dai , Yu Wang , Zhenguo Li , Xihui Liu

Scaling Diffusion Mamba with Bidirectional SSMs for Efficient Image and Video Generation

In recent developments, the Mamba architecture, known for its selective state space approach, has shown potential in the efficient modeling of long sequences. However, its application in image generation remains underexplored. Traditional…

Computer Vision and Pattern Recognition · Computer Science 2024-05-28 Shentong Mo , Yapeng Tian

Mamba-ST: State Space Model for Efficient Style Transfer

The goal of style transfer is, given a content image and a style source, generating a new image preserving the content but with the artistic representation of the style source. Most of the state-of-the-art architectures use transformers or…

Computer Vision and Pattern Recognition · Computer Science 2025-04-15 Filippo Botti , Alex Ergasti , Leonardo Rossi , Tomaso Fontanini , Claudio Ferrari , Massimo Bertozzi , Andrea Prati

Jamba: A Hybrid Transformer-Mamba Language Model

We present Jamba, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. Specifically, Jamba interleaves blocks of Transformer and Mamba layers, enjoying the benefits of both model…

Computation and Language · Computer Science 2024-07-04 Opher Lieber , Barak Lenz , Hofit Bata , Gal Cohen , Jhonathan Osin , Itay Dalmedigos , Erez Safahi , Shaked Meirom , Yonatan Belinkov , Shai Shalev-Shwartz , Omri Abend , Raz Alon , Tomer Asida , Amir Bergman , Roman Glozman , Michael Gokhman , Avashalom Manevich , Nir Ratner , Noam Rozen , Erez Shwartz , Mor Zusman , Yoav Shoham

DiMSUM: Diffusion Mamba -- A Scalable and Unified Spatial-Frequency Method for Image Generation

We introduce a novel state-space architecture for diffusion models, effectively harnessing spatial and frequency information to enhance the inductive bias towards local features in input images for image generation tasks. While state-space…

Computer Vision and Pattern Recognition · Computer Science 2025-04-14 Hao Phung , Quan Dao , Trung Dao , Hoang Phan , Dimitris Metaxas , Anh Tran

TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba

Transformer-based architectures have become the backbone of both uni-modal and multi-modal foundation models, largely due to their scalability via attention mechanisms, resulting in a rich ecosystem of publicly available pre-trained models…

Computer Vision and Pattern Recognition · Computer Science 2025-10-10 Xiuwei Chen , Wentao Hu , Xiao Dong , Sihao Lin , Zisheng Chen , Meng Cao , Yina Zhuang , Jianhua Han , Hang Xu , Xiaodan Liang

ZigMa: A DiT-style Zigzag Mamba Diffusion Model

The diffusion model has long been plagued by scalability and quadratic complexity issues, especially within transformer-based structures. In this study, we aim to leverage the long sequence modeling capability of a State-Space Model called…

Computer Vision and Pattern Recognition · Computer Science 2024-11-26 Vincent Tao Hu , Stefan Andreas Baumann , Ming Gui , Olga Grebenkova , Pingchuan Ma , Johannes Schusterbauer , Björn Ommer

Physics-informed Diffusion Mamba Transformer for Real-world Driving

Autonomous driving systems demand trajectory planners that not only model the inherent uncertainty of future motions but also respect complex temporal dependencies and underlying physical laws. While diffusion-based generative models excel…

Robotics · Computer Science 2026-02-03 Hang Zhou , Qiang Zhang , Peiran Liu , Yihao Qin , Zhaoxu Yan , Yiding Ji

Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data

In recent years, Transformers have become the de-facto architecture for sequence modeling on text and a variety of multi-dimensional data, such as images and video. However, the use of self-attention layers in a Transformer incurs…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Shufan Li , Harkanwar Singh , Aditya Grover

DiffuMamba: High-Throughput Diffusion LMs with Mamba Backbone

Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive (AR) generation, yet their reliance on Transformer backbones limits inference efficiency due to quadratic attention or KV-cache overhead. We…

Machine Learning · Computer Science 2026-03-02 Vaibhav Singh , Oleksiy Ostapenko , Pierre-André Noël , Eugene Belilovsky , Torsten Scholak

Contrast: A Hybrid Architecture of Transformers and State Space Models for Low-Level Vision

Transformers have become increasingly popular for image super-resolution (SR) tasks due to their strong global context modeling capabilities. However, their quadratic computational complexity necessitates the use of window-based attention…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Aman Urumbekov , Zheng Chen

End-to-End Multi-Modal Diffusion Mamba

Current end-to-end multi-modal models utilize different encoders and decoders to process input and output information. This separation hinders the joint representation learning of various modalities. To unify multi-modal processing, we…

Computer Vision and Pattern Recognition · Computer Science 2025-10-20 Chunhao Lu , Qiang Lu , Meichen Dong , Jake Luo

MaskMamba: A Hybrid Mamba-Transformer Model for Masked Image Generation

Image generation models have encountered challenges related to scalability and quadratic complexity, primarily due to the reliance on Transformer-based backbones. In this study, we introduce MaskMamba, a novel hybrid model that combines…

Computer Vision and Pattern Recognition · Computer Science 2024-10-01 Wenchao Chen , Liqiang Niu , Ziyao Lu , Fandong Meng , Jie Zhou

A Survey of Mamba

As one of the most representative DL techniques, Transformer architecture has empowered numerous advanced models, especially the large language models (LLMs) that comprise billions of parameters, becoming a cornerstone in deep learning.…

Machine Learning · Computer Science 2026-04-07 Haohao Qu , Liangbo Ning , Rui An , Wenqi Fan , Tyler Derr , Hui Liu , Xin Xu , Qing Li

Why mamba is effective? Exploit Linear Transformer-Mamba Network for Multi-Modality Image Fusion

Multi-modality image fusion aims to integrate the merits of images from different sources and render high-quality fusion images. However, existing feature extraction and fusion methods are either constrained by inherent local reduction bias…

Computer Vision and Pattern Recognition · Computer Science 2024-09-06 Chenguang Zhu , Shan Gao , Huafeng Chen , Guangqian Guo , Chaowei Wang , Yaoxing Wang , Chen Shu Lei , Quanjiang Fan

TransMamba: A Sequence-Level Hybrid Transformer-Mamba Language Model

Transformers are the cornerstone of modern large language models, but their quadratic computational complexity limits efficiency in long-sequence processing. Recent advancements in Mamba, a state space model (SSM) with linear complexity,…

Machine Learning · Computer Science 2026-01-08 Yixing Li , Ruobing Xie , Zhen Yang , Xingwu Sun , Shuaipeng Li , Weidong Han , Zhanhui Kang , Yu Cheng , Chengzhong Xu , Di Wang , Jie Jiang

Efficient 3D Shape Generation via Diffusion Mamba with Bidirectional SSMs

Recent advancements in sequence modeling have led to the development of the Mamba architecture, noted for its selective state space approach, offering a promising avenue for efficient long sequence handling. However, its application in 3D…

Computer Vision and Pattern Recognition · Computer Science 2024-06-10 Shentong Mo

Attention-Mamba: A Mamba-Enhanced Multi-Scale Parallel Inference Network for Medical Image Segmentation

U-shaped architectures have long dominated the field of medical image segmentation, while Transformers are widely employed for modeling long-range dependencies. The former typically handles scale variations implicitly by aggregating…

Computer Vision and Pattern Recognition · Computer Science 2026-05-12 Yanhua Zhang , Ke Zhang , Jingyu Wang , Gabriella Balestra , Samanta Rosati , Yulin Wu , Wuwei Wang , Valentina Giannini

HybridTM: Combining Transformer and Mamba for 3D Semantic Segmentation

Transformer-based methods have demonstrated remarkable capabilities in 3D semantic segmentation through their powerful attention mechanisms, but the quadratic complexity limits their modeling of long-range dependencies in large-scale point…

Computer Vision and Pattern Recognition · Computer Science 2025-07-25 Xinyu Wang , Jinghua Hou , Zhe Liu , Yingying Zhu

Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing

We introduce Llamba, a family of efficient recurrent language models distilled from Llama-3.x into the Mamba architecture. The series includes Llamba-1B, Llamba-3B, and Llamba-8B, which achieve higher inference throughput and handle…

Machine Learning · Computer Science 2025-02-25 Aviv Bick , Tobias Katsch , Nimit Sohoni , Arjun Desai , Albert Gu