Related papers: Soft-Masked Diffusion Language Models

A Cheaper and Better Diffusion Language Model with Soft-Masked Noise

Diffusion models that are based on iterative denoising have been recently proposed and leveraged in various generation tasks like image generation. Whereas, as a way inherently built for continuous data, existing diffusion models still have…

Computation and Language · Computer Science 2023-04-11 Jiaao Chen , Aston Zhang , Mu Li , Alex Smola , Diyi Yang

Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models

Diffusion Language Models (DLMs) offer a promising alternative for language modeling by enabling parallel decoding through iterative refinement. However, most DLMs rely on hard binary masking and discrete token assignments, which hinder the…

Computation and Language · Computer Science 2026-01-19 Linhao Zhong , Linyu Wu , Bozhen Fang , Tianjian Feng , Chenchen Jing , Wen Wang , Jiaheng Zhang , Hao Chen , Chunhua Shen

Masked Diffusion Language Models with Frequency-Informed Training

We present a masked diffusion language modeling framework for data-efficient training for the BabyLM 2025 Challenge. Our approach applies diffusion training objectives to language modeling under strict data constraints, incorporating…

Computation and Language · Computer Science 2025-09-08 Despoina Kosmopoulou , Efthymios Georgiou , Vaggelis Dorovatas , Georgios Paraskevopoulos , Alexandros Potamianos

MaskDiffusion: Exploiting Pre-trained Diffusion Models for Semantic Segmentation

Semantic segmentation is essential in computer vision for various applications, yet traditional approaches face significant challenges, including the high cost of annotation and extensive training for supervised learning. Additionally, due…

Computer Vision and Pattern Recognition · Computer Science 2024-03-19 Yasufumi Kawano , Yoshimitsu Aoki

Simple and Effective Masked Diffusion Language Models

While diffusion models excel at generating high-quality images, prior work reports a significant performance gap between diffusion and autoregressive (AR) methods in language modeling. In this work, we show that simple masked discrete…

Computation and Language · Computer Science 2024-11-12 Subham Sekhar Sahoo , Marianne Arriola , Yair Schiff , Aaron Gokaslan , Edgar Marroquin , Justin T Chiu , Alexander Rush , Volodymyr Kuleshov

Learn from Your Mistakes: Self-Correcting Masked Diffusion Models

Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models, enabling parallel token generation while achieving competitive performance. Despite these advantages, MDMs face a fundamental limitation: once…

Machine Learning · Computer Science 2026-03-06 Yair Schiff , Omer Belhasin , Roy Uziel , Guanghan Wang , Marianne Arriola , Gilad Turok , Michael Elad , Volodymyr Kuleshov

Training-Free Self-Correction for Multimodal Masked Diffusion Models

Masked diffusion models have emerged as a powerful framework for text and multimodal generation. However, their sampling procedure updates multiple tokens simultaneously and treats generated tokens as immutable, which may lead to error…

Machine Learning · Statistics 2026-02-04 Yidong Ouyang , Panwen Hu , Zhengyan Wan , Zhe Wang , Liyan Xie , Dmitriy Bespalov , Ying Nian Wu , Guang Cheng , Hongyuan Zha , Qiang Sun

Dream 7B: Diffusion Large Language Models

We introduce Dream 7B, the most powerful open diffusion large language model to date. Unlike autoregressive (AR) models that generate tokens sequentially, Dream 7B employs discrete diffusion modeling to refine sequences in parallel through…

Computation and Language · Computer Science 2025-08-22 Jiacheng Ye , Zhihui Xie , Lin Zheng , Jiahui Gao , Zirui Wu , Xin Jiang , Zhenguo Li , Lingpeng Kong

Self-conditioned Embedding Diffusion for Text Generation

Can continuous diffusion models bring the same performance breakthrough on natural language they did for image generation? To circumvent the discrete nature of text data, we can simply project tokens in a continuous space of embeddings, as…

Computation and Language · Computer Science 2022-11-09 Robin Strudel , Corentin Tallec , Florent Altché , Yilun Du , Yaroslav Ganin , Arthur Mensch , Will Grathwohl , Nikolay Savinov , Sander Dieleman , Laurent Sifre , Rémi Leblond

Theoretical Benefit and Limitation of Diffusion Language Model

Diffusion language models have emerged as a promising approach for text generation. One would naturally expect this method to be an efficient replacement for autoregressive models since multiple tokens can be sampled in parallel during each…

Machine Learning · Computer Science 2025-06-10 Guhao Feng , Yihan Geng , Jian Guan , Wei Wu , Liwei Wang , Di He

DualDiffusion: A Speculative Decoding Strategy for Masked Diffusion Models

Masked Diffusion Models (MDMs) offer a promising alternative to autoregressive language models by enabling parallel token generation and bidirectional context modeling. However, their inference speed is significantly limited by the…

Machine Learning · Computer Science 2026-04-08 Satyam Goyal , Kushal Patel , Tanush Mittal , Arjun Laxman

Understanding and Accelerating the Training of Masked Diffusion Language Models

Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models (ARMs) for language modeling. However, MDMs are known to learn substantially more slowly than ARMs, which may become problematic when scaling…

Machine Learning · Computer Science 2026-05-14 Chunsan Hong , Sanghyun Lee , Chieh-Hsin Lai , Satoshi Hayakawa , Yuhta Takida , Yuki Mitsufuji , Seungryong Kim , Jong Chul Ye

Soft Mixture Denoising: Beyond the Expressive Bottleneck of Diffusion Models

Because diffusion models have shown impressive performances in a number of tasks, such as image synthesis, there is a trend in recent works to prove (with certain assumptions) that these models have strong approximation capabilities. In…

Machine Learning · Computer Science 2024-01-19 Yangming Li , Boris van Breugel , Mihaela van der Schaar

Corrective Diffusion Language Models

While Diffusion Language Models (DLMs) are theoretically well-suited for iterative refinement due to their non-causal structure, they often fail to reliably revise incorrect tokens in practice. The key challenge lies in the model's…

Machine Learning · Computer Science 2026-01-30 Shuibai Zhang , Fred Zhangzhi Peng , Yiheng Zhang , Jin Pan , Grigorios G. Chrysos

Beyond Masked and Unmasked: Discrete Diffusion Models via Partial Masking

Masked diffusion models (MDM) are powerful generative models for discrete data that generate samples by progressively unmasking tokens in a sequence. Each token can take one of two states: masked or unmasked. We observe that token sequences…

Machine Learning · Computer Science 2025-10-23 Chen-Hao Chao , Wei-Fang Sun , Hanwen Liang , Chun-Yi Lee , Rahul G. Krishnan

Diffusion Language Models for Speech Recognition

Diffusion language models have recently emerged as a leading alternative to standard language models, due to their ability for bidirectional attention and parallel text generation. In this work, we explore variants for their use in speech…

Computation and Language · Computer Science 2026-04-30 Davyd Naveriani , Albert Zeyer , Ralf Schlüter , Hermann Ney

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

We introduce Transfusion, a recipe for training a multi-modal model over discrete and continuous data. Transfusion combines the language modeling loss function (next token prediction) with diffusion to train a single transformer over…

Artificial Intelligence · Computer Science 2024-08-21 Chunting Zhou , Lili Yu , Arun Babu , Kushal Tirumala , Michihiro Yasunaga , Leonid Shamis , Jacob Kahn , Xuezhe Ma , Luke Zettlemoyer , Omer Levy

ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding

Autoregressive models (ARMs) are hindered by slow sequential inference. While masked diffusion models (MDMs) offer a parallel alternative, they suffer from critical drawbacks: high computational overhead from precluding Key-Value (KV)…

Computation and Language · Computer Science 2026-03-06 Jia-Nan Li , Jian Guan , Wei Wu , Chongxuan Li

Remasking Discrete Diffusion Models with Inference-Time Scaling

Part of the success of diffusion models stems from their ability to perform iterative refinement, i.e., repeatedly correcting outputs during generation. However, modern masked discrete diffusion lacks this capability: when a token is…

Machine Learning · Computer Science 2026-02-10 Guanghan Wang , Yair Schiff , Subham Sekhar Sahoo , Volodymyr Kuleshov

Investigating the Design Space of Diffusion Models for Speech Enhancement

Diffusion models are a new class of generative models that have shown outstanding performance in image generation literature. As a consequence, studies have attempted to apply diffusion models to other tasks, such as speech enhancement. A…

Audio and Speech Processing · Electrical Eng. & Systems 2024-10-10 Philippe Gonzalez , Zheng-Hua Tan , Jan Østergaard , Jesper Jensen , Tommy Sonne Alstrøm , Tobias May