Related papers: BitLM: Unlocking Multi-Token Language Generation w…

Towards Closing the Autoregressive Gap in Language Modeling via Entropy-Gated Continuous Bitstream Diffusion

Diffusion language models (DLMs) promise parallel, order-agnostic generation, but on standard benchmarks they have historically lagged behind autoregressive models in sample quality and diversity. Recent continuous flow and diffusion…

Computation and Language · Computer Science 2026-05-11 Georgios Batzolis , Mark Girolami , Luca Ambrogioni

Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens

Recent endeavors in Multimodal Large Language Models (MLLMs) aim to unify visual comprehension and generation by combining LLM and diffusion models, the state-of-the-art in each task, respectively. Existing approaches rely on spatial visual…

Computer Vision and Pattern Recognition · Computer Science 2025-04-22 Kaihang Pan , Wang Lin , Zhongqi Yue , Tenglong Ao , Liyu Jia , Wei Zhao , Juncheng Li , Siliang Tang , Hanwang Zhang

WeDLM: Reconciling Diffusion Language Models with Standard Causal Attention for Fast Inference

Autoregressive (AR) generation is the standard decoding paradigm for Large Language Models (LLMs), but its token-by-token nature limits parallelism at inference time. Diffusion Language Models (DLLMs) offer parallel decoding by recovering…

Computation and Language · Computer Science 2025-12-30 Aiwei Liu , Minghua He , Shaoxun Zeng , Sijun Zhang , Linhao Zhang , Chuhan Wu , Wei Jia , Yuan Liu , Xiao Zhou , Jie Zhou

Fast Byte Latent Transformer

Recent byte-level language models (LMs) match the performance of token-level models without relying on subword vocabularies, yet their utility is limited by slow, byte-by-byte autoregressive generation. We address this bottleneck in the…

Computation and Language · Computer Science 2026-05-11 Julie Kallini , Artidoro Pagnoni , Tomasz Limisiewicz , Gargi Ghosh , Luke Zettlemoyer , Christopher Potts , Xiaochuang Han , Srinivasan Iyer

A Survey on Diffusion Language Models

Diffusion Language Models (DLMs) are rapidly emerging as a powerful and promising alternative to the dominant autoregressive (AR) paradigm. By generating tokens in parallel through an iterative denoising process, DLMs possess inherent…

Computation and Language · Computer Science 2025-12-08 Tianyi Li , Mingda Chen , Bowei Guo , Zhiqiang Shen

How to Train Your Latent Diffusion Language Model Jointly With the Latent Space

Latent diffusion models offer an attractive alternative to discrete diffusion for non-autoregressive text generation by operating on continuous text representations and denoising entire sequences in parallel. The major challenge in latent…

Computation and Language · Computer Science 2026-05-11 Viacheslav Meshchaninov , Alexander Shabalin , Egor Chimbulatov , Nikita Gushchin , Ilya Koziev , Alexander Korotin , Dmitry Vetrov

Reasoning with Latent Tokens in Diffusion Language Models

Discrete diffusion models have recently become competitive with autoregressive models for language modeling, even outperforming them on reasoning tasks requiring planning and global coherence, but they require more computation at inference…

Machine Learning · Computer Science 2026-02-04 Andre He , Sean Welleck , Daniel Fried

From Bits to Rounds: Parallel Decoding with Exploration for Diffusion Language Models

Diffusion Language Models (DLMs) have recently emerged as a strong alternative to autoregressive language models (LMs). DLMs offer comparable accuracy with faster inference speed via parallel decoding. However, standard DLM decoding…

Machine Learning · Computer Science 2025-11-27 Hengyu Fu , Baihe Huang , Virginia Adams , Charles Wang , Venkat Srinivasan , Jiantao Jiao

Window-Diffusion: Accelerating Diffusion Language Model Inference with Windowed Token Pruning and Caching

Diffusion language models (DLMs) generate text through iterative denoising, but inference requires full-sequence attention at every iteration, resulting in substantial redundant computation on masked tokens. Block-wise diffusion can reduce…

Machine Learning · Computer Science 2026-02-03 Fengrui Zuo , Zhiwei Ke , Yiming Liu , Wenqi Lou , Chao Wang , Xuehai Zhou

Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning

Autoregressive (AR) language models generate text one token at a time, which limits their inference speed. Diffusion-based language models offer a promising alternative, as they can decode multiple tokens in parallel. However, we identify a…

Computation and Language · Computer Science 2025-10-27 Yeongbin Seo , Dongha Lee , Jaehyung Kim , Jinyoung Yeo

Diffusion-LM Improves Controllable Text Generation

Controlling the behavior of language models (LMs) without re-training is a major open problem in natural language generation. While recent works have demonstrated successes on controlling simple sentence attributes (e.g., sentiment), there…

Computation and Language · Computer Science 2022-05-31 Xiang Lisa Li , John Thickstun , Ishaan Gulrajani , Percy Liang , Tatsunori B. Hashimoto

DiffuSpec: Unlocking Diffusion Language Models for Speculative Decoding

As large language models (LLMs) scale up, accuracy improves, but the autoregressive (AR) nature of decoding increases latency since each token requires a serial forward pass. Speculative decoding addresses this by employing a fast drafter…

Computation and Language · Computer Science 2025-10-06 Guanghao Li , Zhihui Fu , Min Fang , Qibin Zhao , Ming Tang , Chun Yuan , Jun Wang

Flow Map Language Models: One-step Language Modeling via Continuous Denoising

Language models based on discrete diffusion have attracted widespread interest for their potential to provide faster generation than autoregressive models. Despite their promise, these models typically produce samples whose quality sharply…

Computation and Language · Computer Science 2026-05-21 Chanhyuk Lee , Jaehoon Yoo , Manan Agarwal , Sheel Shah , Jerry Huang , Aditi Raghunathan , Seunghoon Hong , Nicholas M. Boffi , Jinwoo Kim

Top 10 Open Challenges Steering the Future of Diffusion Language Model and Its Variants

The paradigm of Large Language Models (LLMs) is currently defined by auto-regressive (AR) architectures, which generate text through a sequential ``brick-by-brick'' process. Despite their success, AR models are inherently constrained by a…

Computation and Language · Computer Science 2026-01-21 Yunhe Wang , Kai Han , Huiling Zhen , Yuchuan Tian , Hanting Chen , Yongbing Huang , Yufei Cui , Yingte Shu , Shan Gao , Ismail Elezi , Roy Vaughan Miles , Songcen Xu , Feng Wen , Chao Xu , Sinan Zeng , Dacheng Tao

Efficient numeracy in language models through single-token number embeddings

To drive progress in science and engineering, large language models (LLMs) must be able to process large amounts of numerical data and solve long calculations efficiently. This is currently only possible through the use of external tools or…

Machine Learning · Computer Science 2026-05-21 Linus Kreitner , Paul Hager , Jonathan Mengedoht , Georgios Kaissis , Daniel Rueckert , Martin J. Menten

Multimodal Latent Language Modeling with Next-Token Diffusion

Multimodal generative models require a unified approach to handle both discrete data (e.g., text and code) and continuous data (e.g., image, audio, video). In this work, we propose Latent Language Modeling (LatentLM), which seamlessly…

Computation and Language · Computer Science 2024-12-12 Yutao Sun , Hangbo Bao , Wenhui Wang , Zhiliang Peng , Li Dong , Shaohan Huang , Jianyong Wang , Furu Wei

Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models

Diffusion Language Models (DLMs) offer a promising alternative for language modeling by enabling parallel decoding through iterative refinement. However, most DLMs rely on hard binary masking and discrete token assignments, which hinder the…

Computation and Language · Computer Science 2026-01-19 Linhao Zhong , Linyu Wu , Bozhen Fang , Tianjian Feng , Chenchen Jing , Wen Wang , Jiaheng Zhang , Hao Chen , Chunhua Shen

Bolmo: Byteifying the Next Generation of Language Models

Recent advances in generative AI have been largely driven by large language models (LLMs), deep neural networks that operate over discrete units called tokens. To represent text, the vast majority of LLMs use words or word fragments as the…

Computation and Language · Computer Science 2026-02-10 Benjamin Minixhofer , Tyler Murray , Tomasz Limisiewicz , Anna Korhonen , Luke Zettlemoyer , Noah A. Smith , Edoardo M. Ponti , Luca Soldaini , Valentin Hofmann

Breaking AR's Sampling Bottleneck: Provable Acceleration via Diffusion Language Models

Diffusion models have emerged as a powerful paradigm for modern generative modeling, demonstrating strong potential for large language models (LLMs). Unlike conventional autoregressive (AR) models that generate tokens sequentially,…

Machine Learning · Computer Science 2026-01-09 Gen Li , Changxiao Cai

Soft-Masked Diffusion Language Models

Diffusion models have demonstrated strong potential in language modeling, offering various advantages over traditional autoregressive approaches. Their ability to generate and revise entire responses in parallel enables faster generation…

Machine Learning · Computer Science 2026-03-03 Michael Hersche , Samuel Moor-Smith , Thomas Hofmann , Abbas Rahimi