English
Related papers

Related papers: Looped Diffusion Language Models

200 papers

Masked diffusion models (MDMs) have shown promise in language modeling, yet their scalability and effectiveness in core language tasks, such as text generation and language understanding, remain underexplored. This paper establishes the…

Artificial Intelligence · Computer Science 2025-03-03 Shen Nie , Fengqi Zhu , Chao Du , Tianyu Pang , Qian Liu , Guangtao Zeng , Min Lin , Chongxuan Li

Diffusion language models have emerged as a promising approach for text generation. One would naturally expect this method to be an efficient replacement for autoregressive models since multiple tokens can be sampled in parallel during each…

Machine Learning · Computer Science 2025-06-10 Guhao Feng , Yihan Geng , Jian Guan , Wei Wu , Liwei Wang , Di He

Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models (ARMs) for language modeling. However, MDMs are known to learn substantially more slowly than ARMs, which may become problematic when scaling…

Machine Learning · Computer Science 2026-05-14 Chunsan Hong , Sanghyun Lee , Chieh-Hsin Lai , Satoshi Hayakawa , Yuhta Takida , Yuki Mitsufuji , Seungryong Kim , Jong Chul Ye

Masked diffusion models (MDMs) are a potential alternative to autoregressive models (ARMs) for language generation, but generation quality depends critically on the generation order. Prior work either hard-codes an ordering (e.g., blockwise…

Machine Learning · Computer Science 2026-05-22 Chunsan Hong , Sanghyun Lee , Jong Chul Ye

Masked diffusion models (MDMs) for text offer a compelling alternative to traditional autoregressive language models. Parallel generation makes them efficient, but their computational capabilities and the limitations inherent in their…

Machine Learning · Computer Science 2026-04-28 Anej Svete , Ashish Sabharwal

Masked diffusion language models (MDMs) have recently gained traction as a viable generative framework for natural language. This can be attributed to its scalability and ease of training compared to other diffusion model paradigms for…

Computation and Language · Computer Science 2025-08-19 Tejomay Kishor Padole , Suyash P Awate , Pushpak Bhattacharyya

Recent Speech Large Language Models~(LLMs) have achieved impressive capabilities in end-to-end speech interaction. However, the prevailing autoregressive paradigm imposes strict serial constraints, limiting generation efficiency and…

Computation and Language · Computer Science 2026-02-10 Ziyang Cheng , Yuhao Wang , Heyang Liu , Ronghua Wu , Qunshan Gu , Yanfeng Wang , Yu Wang

Looped language models repeat a set of transformer layers through depth, reducing memory costs and providing natural early-exit points at loop boundaries. However, looped models do not scale as favorably as standard transformers with unique…

Machine Learning · Computer Science 2026-05-12 Ryan Lee , Jacob Biloki , Edward J. Hu , Jonathan May

Masked diffusion models (MDMs) have recently emerged as a promising alternative to autoregressive models over discrete domains. MDMs generate sequences in an any-order, parallel fashion, enabling fast inference and strong performance on…

Machine Learning · Computer Science 2025-09-09 Jaeyeon Kim , Lee Cheuk-Kit , Carles Domingo-Enrich , Yilun Du , Sham Kakade , Timothy Ngotiaoco , Sitan Chen , Michael Albergo

Diffusion language models, especially masked discrete diffusion models, have achieved great success recently. While there are some theoretical and primary empirical results showing the advantages of latent reasoning with looped transformers…

Artificial Intelligence · Computer Science 2026-05-13 Cai Zhou , Chenxiao Yang , Yi Hu , Chenyu Wang , Chubin Zhang , Muhan Zhang , Lester Mackey , Tommi Jaakkola , Stephen Bates , Dinghuai Zhang

Masked diffusion models (MDMs) have emerged as a popular research topic for generative modeling of discrete data, thanks to their superior performance over other discrete diffusion models, and are rivaling the auto-regressive models (ARMs)…

Machine Learning · Computer Science 2025-05-01 Kaiwen Zheng , Yongxin Chen , Hanzi Mao , Ming-Yu Liu , Jun Zhu , Qinsheng Zhang

Masked Diffusion Models (MDMs) have emerged as a promising approach for generative modeling in discrete spaces. By generating sequences in any order and allowing for parallel decoding, they enable fast inference and strong performance on…

Machine Learning · Computer Science 2026-02-12 Jaeyeon Kim , Jonathan Geuter , David Alvarez-Melis , Sham Kakade , Sitan Chen

Post-training pretrained autoregressive models (ARMs) into masked diffusion models (MDMs) has emerged as a cost-effective way to overcome the limitations of sequential generation. Yet it remains unclear whether post-trained MDMs acquire…

Machine Learning · Computer Science 2026-05-29 Injin Kong , Hyoungjoon Lee , Yohan Jo

Masked diffusion models (MDM) are powerful generative models for discrete data that generate samples by progressively unmasking tokens in a sequence. Each token can take one of two states: masked or unmasked. We observe that token sequences…

Machine Learning · Computer Science 2025-10-23 Chen-Hao Chao , Wei-Fang Sun , Hanwen Liang , Chun-Yi Lee , Rahul G. Krishnan

Masked Diffusion Models (MDMs) have emerged as one of the most promising paradigms for generative modeling over discrete domains. It is known that MDMs effectively train to decode tokens in a random order, and that this ordering has…

Machine Learning · Computer Science 2025-11-25 Prateek Garg , Bhavya Kohli , Sunita Sarawagi

As a class of fruitful approaches, diffusion probabilistic models (DPMs) have shown excellent advantages in high-resolution image reconstruction. On the other hand, masked autoencoders (MAEs), as popular self-supervised vision learners,…

Computer Vision and Pattern Recognition · Computer Science 2023-12-14 Zhiyuan Ma , zhihuan yu , Jianjun Li , Bowen Zhou

Masked Diffusion Models (MDMs) offer a promising alternative to autoregressive language models by enabling parallel token generation and bidirectional context modeling. However, their inference speed is significantly limited by the…

Machine Learning · Computer Science 2026-04-08 Satyam Goyal , Kushal Patel , Tanush Mittal , Arjun Laxman

Diffusion models that are based on iterative denoising have been recently proposed and leveraged in various generation tasks like image generation. Whereas, as a way inherently built for continuous data, existing diffusion models still have…

Computation and Language · Computer Science 2023-04-11 Jiaao Chen , Aston Zhang , Mu Li , Alex Smola , Diyi Yang

Recent advances in masked diffusion language models (MDLMs) narrow the quality gap to autoregressive LMs, but their sampling remains expensive because generation requires many full-sequence denoising passes with a large Transformer and,…

Machine Learning · Computer Science 2026-04-14 Ivan Sedykh , Nikita Sorokin , Valentin Malykh

Diffusion language models are a promising alternative to autoregressive models due to their potential for faster generation. Among discrete diffusion approaches, Masked diffusion currently dominates, largely driven by strong perplexity on…

Machine Learning · Computer Science 2026-02-17 Subham Sekhar Sahoo , Jean-Marie Lemercier , Zhihan Yang , Justin Deschenaux , Jingyu Liu , John Thickstun , Ante Jukic
‹ Prev 1 2 3 10 Next ›