Related papers: Unifying Autoregressive and Diffusion-Based Sequen…

FlashDLM: Accelerating Diffusion Language Model Inference via Efficient KV Caching and Guided Diffusion

Diffusion language models offer parallel token generation and inherent bidirectionality, promising more efficient and powerful sequence modeling compared to autoregressive approaches. However, state-of-the-art diffusion models (e.g., Dream…

Computation and Language · Computer Science 2025-10-10 Zhanqiu Hu , Jian Meng , Yash Akhauri , Mohamed S. Abdelfattah , Jae-sun Seo , Zhiru Zhang , Udit Gupta

WeDLM: Reconciling Diffusion Language Models with Standard Causal Attention for Fast Inference

Autoregressive (AR) generation is the standard decoding paradigm for Large Language Models (LLMs), but its token-by-token nature limits parallelism at inference time. Diffusion Language Models (DLLMs) offer parallel decoding by recovering…

Computation and Language · Computer Science 2025-12-30 Aiwei Liu , Minghua He , Shaoxun Zeng , Sijun Zhang , Linhao Zhang , Chuhan Wu , Wei Jia , Yuan Liu , Xiao Zhou , Jie Zhou

ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding

Autoregressive models (ARMs) are hindered by slow sequential inference. While masked diffusion models (MDMs) offer a parallel alternative, they suffer from critical drawbacks: high computational overhead from precluding Key-Value (KV)…

Computation and Language · Computer Science 2026-03-06 Jia-Nan Li , Jian Guan , Wei Wu , Chongxuan Li

Promises, Outlooks and Challenges of Diffusion Language Modeling

The modern autoregressive Large Language Models (LLMs) have achieved outstanding performance on NLP benchmarks, and they are deployed in the real world. However, they still suffer from limitations of the autoregressive training paradigm.…

Computation and Language · Computer Science 2024-07-11 Justin Deschenaux , Caglar Gulcehre

A Survey on Diffusion Language Models

Diffusion Language Models (DLMs) are rapidly emerging as a powerful and promising alternative to the dominant autoregressive (AR) paradigm. By generating tokens in parallel through an iterative denoising process, DLMs possess inherent…

Computation and Language · Computer Science 2025-12-08 Tianyi Li , Mingda Chen , Bowei Guo , Zhiqiang Shen

Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens

Recent endeavors in Multimodal Large Language Models (MLLMs) aim to unify visual comprehension and generation by combining LLM and diffusion models, the state-of-the-art in each task, respectively. Existing approaches rely on spatial visual…

Computer Vision and Pattern Recognition · Computer Science 2025-04-22 Kaihang Pan , Wang Lin , Zhongqi Yue , Tenglong Ao , Liyu Jia , Wei Zhao , Juncheng Li , Siliang Tang , Hanwang Zhang

Beyond Autoregression: Fast LLMs via Self-Distillation Through Time

Autoregressive (AR) Large Language Models (LLMs) have demonstrated significant success across numerous tasks. However, the AR modeling paradigm presents certain limitations; for instance, contemporary autoregressive LLMs are trained to…

Machine Learning · Computer Science 2025-02-10 Justin Deschenaux , Caglar Gulcehre

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Diffusion language models offer unique benefits over autoregressive models due to their potential for parallelized generation and controllability, yet they lag in likelihood modeling and are limited to fixed-length generation. In this work,…

Machine Learning · Computer Science 2025-05-20 Marianne Arriola , Aaron Gokaslan , Justin T. Chiu , Zhihan Yang , Zhixuan Qi , Jiaqi Han , Subham Sekhar Sahoo , Volodymyr Kuleshov

Unifying Continuous and Discrete Text Diffusion with Non-simultaneous Diffusion Processes

Diffusion models have emerged as a promising approach for text generation, with recent works falling into two main categories: discrete and continuous diffusion models. Discrete diffusion models apply token corruption independently using…

Computation and Language · Computer Science 2025-05-29 Bocheng Li , Zhujin Gao , Linli Xu

Beyond Autoregression: An Empirical Study of Diffusion Large Language Models for Code Generation

LLMs have become the mainstream approaches to code generation. Existing LLMs mainly employ autoregressive generation, i.e. generating code token-by-token from left to right. However, the underlying autoregressive generation has two…

Software Engineering · Computer Science 2025-11-04 Chengze Li , Yitong Zhang , Jia Li , Liyi Cai , Ge Li

RDPM: Solve Diffusion Probabilistic Models via Recurrent Token Prediction

Diffusion Probabilistic Models (DPMs) have emerged as the de facto approach for high-fidelity image synthesis, operating diffusion processes on continuous VAE latent, which significantly differ from the text generation methods employed by…

Computer Vision and Pattern Recognition · Computer Science 2024-12-30 Xiaoping Wu , Jie Hu , Xiaoming Wei

Discrete Diffusion in Large Language and Multimodal Models: A Survey

In this work, we provide a systematic survey of Discrete Diffusion Language Models (dLLMs) and Discrete Diffusion Multimodal Language Models (dMLLMs). Unlike autoregressive (AR) models, dLLMs and dMLLMs adopt a multi-token, parallel…

Machine Learning · Computer Science 2025-09-22 Runpeng Yu , Qi Li , Xinchao Wang

Sequential Diffusion Language Models

Diffusion language models (DLMs) have strong theoretical efficiency but are limited by fixed-length decoding and incompatibility with key-value (KV) caches. Block diffusion mitigates these issues, yet still enforces a fixed block size and…

Computation and Language · Computer Science 2025-09-30 Yangzhou Liu , Yue Cao , Hao Li , Gen Luo , Zhe Chen , Weiyun Wang , Xiaobo Liang , Biqing Qi , Lijun Wu , Changyao Tian , Yanting Zhang , Yuqiang Li , Tong Lu , Yu Qiao , Jifeng Dai , Wenhai Wang

SeqDiffuSeq: Text Diffusion with Encoder-Decoder Transformers

Diffusion model, a new generative modelling paradigm, has achieved great success in image, audio, and video generation. However, considering the discrete categorical nature of text, it is not trivial to extend continuous diffusion models to…

Computation and Language · Computer Science 2023-05-23 Hongyi Yuan , Zheng Yuan , Chuanqi Tan , Fei Huang , Songfang Huang

Sequential Data Generation with Groupwise Diffusion Process

We present the Groupwise Diffusion Model (GDM), which divides data into multiple groups and diffuses one group at one time interval in the forward diffusion process. GDM generates data sequentially from one group at one time interval,…

Computer Vision and Pattern Recognition · Computer Science 2023-10-03 Sangyun Lee , Gayoung Lee , Hyunsu Kim , Junho Kim , Youngjung Uh

Breaking AR's Sampling Bottleneck: Provable Acceleration via Diffusion Language Models

Diffusion models have emerged as a powerful paradigm for modern generative modeling, demonstrating strong potential for large language models (LLMs). Unlike conventional autoregressive (AR) models that generate tokens sequentially,…

Machine Learning · Computer Science 2026-01-09 Gen Li , Changxiao Cai

On Powerful Ways to Generate: Autoregression, Diffusion, and Beyond

Diffusion language models have recently emerged as a competitive alternative to autoregressive language models. Beyond next-token generation, they are more efficient and flexible by enabling parallel and any-order token generation. However,…

Machine Learning · Computer Science 2025-11-18 Chenxiao Yang , Cai Zhou , David Wipf , Zhiyuan Li

CDLM: Consistency Diffusion Language Models For Faster Sampling

Diffusion Language Models (DLMs) offer a promising parallel generation paradigm but suffer from slow inference due to numerous refinement steps and the inability to use standard KV caching. We introduce CDLM (Consistency Diffusion Language…

Machine Learning · Computer Science 2026-02-23 Minseo Kim , Chenfeng Xu , Coleman Hooper , Harman Singh , Ben Athiwaratkun , Ce Zhang , Kurt Keutzer , Amir Gholami

RDM: Recurrent Diffusion Model for Human Motion Generation

Human motion generation is a challenging task due to its high dimensionality and the difficulty of generating fine-grained motions. Diffusion methods have been proposed due to their high sample quality and expressiveness. Early approaches…

Computer Vision and Pattern Recognition · Computer Science 2026-03-10 Mirgahney Mohamed , Harry Jake Cunningham , Marc P. Deisenroth , Lourdes Agapito

TimeLDM: Latent Diffusion Model for Unconditional Time Series Generation

Time series generation is a crucial research topic in the area of decision-making systems, which can be particularly important in domains like autonomous driving, healthcare, and, notably, robotics. Recent approaches focus on learning in…

Machine Learning · Computer Science 2024-09-16 Jian Qian , Bingyu Xie , Biao Wan , Minhao Li , Miao Sun , Patrick Yin Chiang