Related papers: DiRL: An Efficient Post-Training Framework for Dif…

Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models

We propose TraceRL, a trajectory-aware reinforcement learning framework for diffusion language models (DLMs) that incorporates preferred inference trajectory into post-training, and is applicable across different architectures. Equipped…

Computation and Language · Computer Science 2025-09-09 Yinjie Wang , Ling Yang , Bowen Li , Ye Tian , Ke Shen , Mengdi Wang

Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding

In this work, we propose Dimple, the first Discrete Diffusion Multimodal Large Language Model (DMLLM). We observe that training with a purely discrete diffusion approach leads to significant training instability, suboptimal performance, and…

Computer Vision and Pattern Recognition · Computer Science 2025-05-27 Runpeng Yu , Xinyin Ma , Xinchao Wang

DiFFPO: Training Diffusion LLMs to Reason Fast and Furious via Reinforcement Learning

We propose DiFFPO, Diffusion Fast and Furious Policy Optimization, a unified framework for training masked diffusion large language models (dLLMs) to reason not only better (furious), but also faster via reinforcement learning (RL). We…

Machine Learning · Computer Science 2026-01-13 Hanyang Zhao , Dawen Liang , Wenpin Tang , David Yao , Nathan Kallus

Simple Policy Gradients for Reasoning with Diffusion Language Models

Diffusion large language models (dLLMs), which offer a promising alternative to traditional autoregressive LLMs, have recently shown strong results in pretraining. However, due to their lack of tractable sequence-level likelihoods, they…

Machine Learning · Computer Science 2026-02-03 Anthony Zhan

d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning

Recent large language models (LLMs) have demonstrated strong reasoning capabilities that benefits from online reinforcement learning (RL). These capabilities have primarily been demonstrated within the left-to-right autoregressive (AR)…

Computation and Language · Computer Science 2025-06-04 Siyan Zhao , Devaansh Gupta , Qinqing Zheng , Aditya Grover

A Survey on Diffusion Language Models

Diffusion Language Models (DLMs) are rapidly emerging as a powerful and promising alternative to the dominant autoregressive (AR) paradigm. By generating tokens in parallel through an iterative denoising process, DLMs possess inherent…

Computation and Language · Computer Science 2025-12-08 Tianyi Li , Mingda Chen , Bowei Guo , Zhiqiang Shen

Enhancing Reasoning for Diffusion LLMs via Distribution Matching Policy Optimization

Diffusion large language models (dLLMs) are promising alternatives to autoregressive large language models (AR-LLMs), as they potentially allow higher inference throughput. Reinforcement learning (RL) is a crucial component for dLLMs to…

Machine Learning · Computer Science 2026-02-24 Yuchen Zhu , Wei Guo , Jaemoo Choi , Petr Molodyk , Bo Yuan , Molei Tao , Yongxin Chen

DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models

Diffusion-based decoding has recently emerged as an appealing alternative to autoregressive (AR) generation, offering the potential to update multiple tokens in parallel and reduce latency. However, diffusion vision language models (dVLMs)…

Computer Vision and Pattern Recognition · Computer Science 2026-04-01 Lunbin Zeng , Jingfeng Yao , Bencheng Liao , Hongyuan Tao , Wenyu Liu , Xinggang Wang

MDPO: Overcoming the Training-Inference Divide of Masked Diffusion Language Models

Diffusion language models, as a promising alternative to traditional autoregressive (AR) models, enable faster generation and richer conditioning on bidirectional context. However, they suffer from a key discrepancy between training and…

Machine Learning · Computer Science 2025-09-26 Haoyu He , Katrin Renz , Yong Cao , Andreas Geiger

Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed

Diffusion language models (dLMs) have emerged as a promising paradigm that enables parallel, non-autoregressive generation, but their learning efficiency lags behind that of autoregressive (AR) language models when trained from scratch. To…

Computation and Language · Computer Science 2026-05-01 Yonggan Fu , Lexington Whalen , Zhifan Ye , Xin Dong , Shizhe Diao , Jingyu Liu , Chengyue Wu , Hao Zhang , Enze Xie , Song Han , Maksim Khadkevich , Jan Kautz , Yingyan Celine Lin , Pavlo Molchanov

Inversion-DPO: Precise and Efficient Post-Training for Diffusion Models

Recent advancements in diffusion models (DMs) have been propelled by alignment methods that post-train models to better conform to human preferences. However, these approaches typically require computation-intensive training of a base model…

Computer Vision and Pattern Recognition · Computer Science 2025-08-05 Zejian Li , Yize Li , Chenye Meng , Zhongni Liu , Yang Ling , Shengyuan Zhang , Guang Yang , Changyuan Yang , Zhiyuan Yang , Lingyun Sun

Beyond Mode-Seeking RL: Trajectory-Balance Post-Training for Diffusion Language Models

Diffusion language models are a promising alternative to autoregressive models, yet post-training methods for them largely adapt reward-maximizing objectives. We identify a central failure mode in this setting we call trajectory locking:…

Machine Learning · Computer Science 2026-05-15 Saba Ahmadi , Prasanna Parthasarathi , Yufei Cui

Diffusion-Augmented Reinforcement Learning for Robust Portfolio Optimization under Stress Scenarios

In the ever-changing and intricate landscape of financial markets, portfolio optimisation remains a formidable challenge for investors and asset managers. Conventional methods often struggle to capture the complex dynamics of market…

Machine Learning · Statistics 2025-10-09 Himanshu Choudhary , Arishi Orra , Manoj Thakur

LlamaRL: A Distributed Asynchronous Reinforcement Learning Framework for Efficient Large-scale LLM Training

Reinforcement Learning (RL) has become the most effective post-training approach for improving the capabilities of Large Language Models (LLMs). In practice, because of the high demands on latency and memory, it is particularly challenging…

Machine Learning · Computer Science 2025-06-03 Bo Wu , Sid Wang , Yunhao Tang , Jia Ding , Eryk Helenowski , Liang Tan , Tengyu Xu , Tushar Gowda , Zhengxing Chen , Chen Zhu , Xiaocheng Tang , Yundi Qian , Beibei Zhu , Rui Hou

Post-training Large Language Models for Diverse High-Quality Responses

Reinforcement learning (RL) has emerged as a popular method for post-training large language models (LLMs). While improving the model's performance on downstream tasks, it often reduces the model's output diversity, leading to narrow,…

Computation and Language · Computer Science 2026-03-03 Yilei Chen , Souradip Chakraborty , Lorenz Wolf , Yannis Paschalidis , Aldo Pacchiano

Reasoning in Diffusion Large Language Models is Concentrated in Dynamic Confusion Zones

Diffusion Large Language Models (dLLMs) are rapidly emerging alongside autoregressive models as a powerful paradigm for complex reasoning, with reinforcement learning increasingly used for downstream alignment. Existing trajectory-based RL…

Machine Learning · Computer Science 2025-11-20 Ranfei Chen , Ming Chen , Kaifei Wang

Improving Reasoning for Diffusion Language Models via Group Diffusion Policy Optimization

Diffusion language models (DLMs) enable parallel, order-agnostic generation with iterative refinement, offering a flexible alternative to autoregressive large language models (LLMs). However, adapting reinforcement learning (RL) fine-tuning…

Machine Learning · Computer Science 2026-02-12 Kevin Rojas , Jiahe Lin , Kashif Rasul , Anderson Schneider , Yuriy Nevmyvaka , Molei Tao , Wei Deng

d-TreeRPO: Towards More Reliable Policy Optimization for Diffusion Language Models

Reinforcement learning (RL) is pivotal for enhancing the reasoning capabilities of diffusion large language models (dLLMs). However, existing dLLM policy optimization methods suffer from two critical reliability bottlenecks: (1) reward…

Computation and Language · Computer Science 2026-05-14 Leyi Pan , Shuchang Tao , Yunpeng Zhai , Zheyu Fu , Liancheng Fang , Minghua He , Lingzhe Zhang , Zhaoyang Liu , Bolin Ding , Aiwei Liu , Lijie Wen

Discrete Diffusion in Large Language and Multimodal Models: A Survey

In this work, we provide a systematic survey of Discrete Diffusion Language Models (dLLMs) and Discrete Diffusion Multimodal Language Models (dMLLMs). Unlike autoregressive (AR) models, dLLMs and dMLLMs adopt a multi-token, parallel…

Machine Learning · Computer Science 2025-09-22 Runpeng Yu , Qi Li , Xinchao Wang

Dichotomous Diffusion Policy Optimization

Diffusion-based policies have gained growing popularity in solving a wide range of decision-making tasks due to their superior expressiveness and controllable generation during inference. However, effectively training large diffusion…

Machine Learning · Computer Science 2026-02-03 Ruiming Liang , Yinan Zheng , Kexin Zheng , Tianyi Tan , Jianxiong Li , Liyuan Mao , Zhihao Wang , Guang Chen , Hangjun Ye , Jingjing Liu , Jinqiao Wang , Xianyuan Zhan