Related papers: DiffuMamba: High-Throughput Diffusion LMs with Mam…

DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis

Diffusion models have achieved great success in image generation, with the backbone evolving from U-Net to Vision Transformers. However, the computational cost of Transformers is quadratic to the number of tokens, leading to significant…

Computer Vision and Pattern Recognition · Computer Science 2024-07-11 Yao Teng , Yue Wu , Han Shi , Xuefei Ning , Guohao Dai , Yu Wang , Zhenguo Li , Xihui Liu

LaMamba-Diff: Linear-Time High-Fidelity Diffusion Models Based on Local Attention and Mamba

Recent Transformer-based diffusion models have shown remarkable performance, largely attributed to the ability of the self-attention mechanism to accurately capture both global and local contexts by computing all-pair interactions among…

Computer Vision and Pattern Recognition · Computer Science 2024-09-20 Yunxiang Fu , Chaoqi Chen , Yizhou Yu

Dimba: Transformer-Mamba Diffusion Models

This paper unveils Dimba, a new text-to-image diffusion model that employs a distinctive hybrid architecture combining Transformer and Mamba elements. Specifically, Dimba sequentially stacked blocks alternate between Transformer and Mamba…

Computer Vision and Pattern Recognition · Computer Science 2024-06-04 Zhengcong Fei , Mingyuan Fan , Changqian Yu , Debang Li , Youqiang Zhang , Junshi Huang

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Linear RNN architectures, like Mamba, can be competitive with Transformer models in language modeling while having advantageous deployment characteristics. Given the focus on training large-scale Transformer models, we consider the…

Machine Learning · Computer Science 2025-06-30 Junxiong Wang , Daniele Paliotta , Avner May , Alexander M. Rush , Tri Dao

TransMamba: A Sequence-Level Hybrid Transformer-Mamba Language Model

Transformers are the cornerstone of modern large language models, but their quadratic computational complexity limits efficiency in long-sequence processing. Recent advancements in Mamba, a state space model (SSM) with linear complexity,…

Machine Learning · Computer Science 2026-01-08 Yixing Li , Ruobing Xie , Zhen Yang , Xingwu Sun , Shuaipeng Li , Weidong Han , Zhanhui Kang , Yu Cheng , Chengzhong Xu , Di Wang , Jie Jiang

A Survey on Diffusion Language Models

Diffusion Language Models (DLMs) are rapidly emerging as a powerful and promising alternative to the dominant autoregressive (AR) paradigm. By generating tokens in parallel through an iterative denoising process, DLMs possess inherent…

Computation and Language · Computer Science 2025-12-08 Tianyi Li , Mingda Chen , Bowei Guo , Zhiqiang Shen

Scaling Diffusion Mamba with Bidirectional SSMs for Efficient Image and Video Generation

In recent developments, the Mamba architecture, known for its selective state space approach, has shown potential in the efficient modeling of long sequences. However, its application in image generation remains underexplored. Traditional…

Computer Vision and Pattern Recognition · Computer Science 2024-05-28 Shentong Mo , Yapeng Tian

Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing

We introduce Llamba, a family of efficient recurrent language models distilled from Llama-3.x into the Mamba architecture. The series includes Llamba-1B, Llamba-3B, and Llamba-8B, which achieve higher inference throughput and handle…

Machine Learning · Computer Science 2025-02-25 Aviv Bick , Tobias Katsch , Nimit Sohoni , Arjun Desai , Albert Gu

FlashDLM: Accelerating Diffusion Language Model Inference via Efficient KV Caching and Guided Diffusion

Diffusion language models offer parallel token generation and inherent bidirectionality, promising more efficient and powerful sequence modeling compared to autoregressive approaches. However, state-of-the-art diffusion models (e.g., Dream…

Computation and Language · Computer Science 2025-10-10 Zhanqiu Hu , Jian Meng , Yash Akhauri , Mohamed S. Abdelfattah , Jae-sun Seo , Zhiru Zhang , Udit Gupta

DLLM Agent: See Farther, Run Faster

Diffusion large language models (DLLMs) have emerged as an alternative to autoregressive (AR) decoding with appealing efficiency and modeling properties, yet their implications for agentic multi-step decision making remain underexplored. We…

Computation and Language · Computer Science 2026-03-23 Huiling Zhen , Weizhe Lin , Renxi Liu , Kai Han , Yiming Li , Yuchuan Tian , Hanting Chen , Xiaoguang Li , Xiaosong Li , Chen Chen , Xianzhi Yu , Mingxuan Yuan , Youliang Yan , Peifeng Qin , Jun Wang , Yu Wang , Dacheng Tao , Yunhe Wang

Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed

Diffusion language models (dLMs) have emerged as a promising paradigm that enables parallel, non-autoregressive generation, but their learning efficiency lags behind that of autoregressive (AR) language models when trained from scratch. To…

Computation and Language · Computer Science 2026-05-01 Yonggan Fu , Lexington Whalen , Zhifan Ye , Xin Dong , Shizhe Diao , Jingyu Liu , Chengyue Wu , Hao Zhang , Enze Xie , Song Han , Maksim Khadkevich , Jan Kautz , Yingyan Celine Lin , Pavlo Molchanov

Efficient 3D Shape Generation via Diffusion Mamba with Bidirectional SSMs

Recent advancements in sequence modeling have led to the development of the Mamba architecture, noted for its selective state space approach, offering a promising avenue for efficient long sequence handling. However, its application in 3D…

Computer Vision and Pattern Recognition · Computer Science 2024-06-10 Shentong Mo

A Survey of Mamba

As one of the most representative DL techniques, Transformer architecture has empowered numerous advanced models, especially the large language models (LLMs) that comprise billions of parameters, becoming a cornerstone in deep learning.…

Machine Learning · Computer Science 2026-04-07 Haohao Qu , Liangbo Ning , Rui An , Wenqi Fan , Tyler Derr , Hui Liu , Xin Xu , Qing Li

Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding

In this work, we propose Dimple, the first Discrete Diffusion Multimodal Large Language Model (DMLLM). We observe that training with a purely discrete diffusion approach leads to significant training instability, suboptimal performance, and…

Computer Vision and Pattern Recognition · Computer Science 2025-05-27 Runpeng Yu , Xinyin Ma , Xinchao Wang

Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing

Diffusion Large Language Models (dLLMs) have emerged as a promising alternative to autoregressive (AR) LLMs for text generation, with the potential to decode multiple tokens in a single iteration. However, none of the existing open-source…

Machine Learning · Computer Science 2025-08-14 Xu Wang , Chenkai Xu , Yijie Jin , Jiachun Jin , Hao Zhang , Zhijie Deng

Differential Mamba

Sequence models like Transformers and RNNs often overallocate attention to irrelevant context, leading to noisy intermediate representations. This degrades LLM capabilities by promoting hallucinations, weakening long-range and retrieval…

Machine Learning · Computer Science 2025-10-30 Nadav Schneider , Itamar Zimerman , Eliya Nachmani

DualDiffusion: A Speculative Decoding Strategy for Masked Diffusion Models

Masked Diffusion Models (MDMs) offer a promising alternative to autoregressive language models by enabling parallel token generation and bidirectional context modeling. However, their inference speed is significantly limited by the…

Machine Learning · Computer Science 2026-04-08 Satyam Goyal , Kushal Patel , Tanush Mittal , Arjun Laxman

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention,…

Machine Learning · Computer Science 2024-06-03 Albert Gu , Tri Dao

M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Effective reasoning is crucial to solving complex mathematical problems. Recent large language models (LLMs) have boosted performance by scaling test-time computation through long chain-of-thought reasoning. However, transformer-based…

Machine Learning · Computer Science 2025-09-10 Junxiong Wang , Wen-Ding Li , Daniele Paliotta , Daniel Ritter , Alexander M. Rush , Tri Dao

ReMamba: Equip Mamba with Effective Long-Sequence Modeling

While the Mamba architecture demonstrates superior inference efficiency and competitive performance on short-context natural language processing (NLP) tasks, empirical evidence suggests its capacity to comprehend long contexts is limited…

Computation and Language · Computer Science 2025-01-03 Danlong Yuan , Jiahao Liu , Bei Li , Huishuai Zhang , Jingang Wang , Xunliang Cai , Dongyan Zhao