English
Related papers

Related papers: DiRL: An Efficient Post-Training Framework for Dif…

200 papers

We propose TraceRL, a trajectory-aware reinforcement learning framework for diffusion language models (DLMs) that incorporates preferred inference trajectory into post-training, and is applicable across different architectures. Equipped…

Computation and Language · Computer Science 2025-09-09 Yinjie Wang , Ling Yang , Bowen Li , Ye Tian , Ke Shen , Mengdi Wang

In this work, we propose Dimple, the first Discrete Diffusion Multimodal Large Language Model (DMLLM). We observe that training with a purely discrete diffusion approach leads to significant training instability, suboptimal performance, and…

Computer Vision and Pattern Recognition · Computer Science 2025-05-27 Runpeng Yu , Xinyin Ma , Xinchao Wang

We propose DiFFPO, Diffusion Fast and Furious Policy Optimization, a unified framework for training masked diffusion large language models (dLLMs) to reason not only better (furious), but also faster via reinforcement learning (RL). We…

Machine Learning · Computer Science 2026-01-13 Hanyang Zhao , Dawen Liang , Wenpin Tang , David Yao , Nathan Kallus

Diffusion large language models (dLLMs), which offer a promising alternative to traditional autoregressive LLMs, have recently shown strong results in pretraining. However, due to their lack of tractable sequence-level likelihoods, they…

Machine Learning · Computer Science 2026-02-03 Anthony Zhan

Recent large language models (LLMs) have demonstrated strong reasoning capabilities that benefits from online reinforcement learning (RL). These capabilities have primarily been demonstrated within the left-to-right autoregressive (AR)…

Computation and Language · Computer Science 2025-06-04 Siyan Zhao , Devaansh Gupta , Qinqing Zheng , Aditya Grover

Diffusion Language Models (DLMs) are rapidly emerging as a powerful and promising alternative to the dominant autoregressive (AR) paradigm. By generating tokens in parallel through an iterative denoising process, DLMs possess inherent…

Computation and Language · Computer Science 2025-12-08 Tianyi Li , Mingda Chen , Bowei Guo , Zhiqiang Shen

Diffusion large language models (dLLMs) are promising alternatives to autoregressive large language models (AR-LLMs), as they potentially allow higher inference throughput. Reinforcement learning (RL) is a crucial component for dLLMs to…

Machine Learning · Computer Science 2026-02-24 Yuchen Zhu , Wei Guo , Jaemoo Choi , Petr Molodyk , Bo Yuan , Molei Tao , Yongxin Chen

Diffusion-based decoding has recently emerged as an appealing alternative to autoregressive (AR) generation, offering the potential to update multiple tokens in parallel and reduce latency. However, diffusion vision language models (dVLMs)…

Computer Vision and Pattern Recognition · Computer Science 2026-04-01 Lunbin Zeng , Jingfeng Yao , Bencheng Liao , Hongyuan Tao , Wenyu Liu , Xinggang Wang

Diffusion language models, as a promising alternative to traditional autoregressive (AR) models, enable faster generation and richer conditioning on bidirectional context. However, they suffer from a key discrepancy between training and…

Machine Learning · Computer Science 2025-09-26 Haoyu He , Katrin Renz , Yong Cao , Andreas Geiger

Diffusion language models (dLMs) have emerged as a promising paradigm that enables parallel, non-autoregressive generation, but their learning efficiency lags behind that of autoregressive (AR) language models when trained from scratch. To…

Recent advancements in diffusion models (DMs) have been propelled by alignment methods that post-train models to better conform to human preferences. However, these approaches typically require computation-intensive training of a base model…

Computer Vision and Pattern Recognition · Computer Science 2025-08-05 Zejian Li , Yize Li , Chenye Meng , Zhongni Liu , Yang Ling , Shengyuan Zhang , Guang Yang , Changyuan Yang , Zhiyuan Yang , Lingyun Sun

Diffusion language models are a promising alternative to autoregressive models, yet post-training methods for them largely adapt reward-maximizing objectives. We identify a central failure mode in this setting we call trajectory locking:…

Machine Learning · Computer Science 2026-05-15 Saba Ahmadi , Prasanna Parthasarathi , Yufei Cui

In the ever-changing and intricate landscape of financial markets, portfolio optimisation remains a formidable challenge for investors and asset managers. Conventional methods often struggle to capture the complex dynamics of market…

Machine Learning · Statistics 2025-10-09 Himanshu Choudhary , Arishi Orra , Manoj Thakur

Reinforcement Learning (RL) has become the most effective post-training approach for improving the capabilities of Large Language Models (LLMs). In practice, because of the high demands on latency and memory, it is particularly challenging…

Reinforcement learning (RL) has emerged as a popular method for post-training large language models (LLMs). While improving the model's performance on downstream tasks, it often reduces the model's output diversity, leading to narrow,…

Computation and Language · Computer Science 2026-03-03 Yilei Chen , Souradip Chakraborty , Lorenz Wolf , Yannis Paschalidis , Aldo Pacchiano

Diffusion Large Language Models (dLLMs) are rapidly emerging alongside autoregressive models as a powerful paradigm for complex reasoning, with reinforcement learning increasingly used for downstream alignment. Existing trajectory-based RL…

Machine Learning · Computer Science 2025-11-20 Ranfei Chen , Ming Chen , Kaifei Wang

Diffusion language models (DLMs) enable parallel, order-agnostic generation with iterative refinement, offering a flexible alternative to autoregressive large language models (LLMs). However, adapting reinforcement learning (RL) fine-tuning…

Machine Learning · Computer Science 2026-02-12 Kevin Rojas , Jiahe Lin , Kashif Rasul , Anderson Schneider , Yuriy Nevmyvaka , Molei Tao , Wei Deng

Reinforcement learning (RL) is pivotal for enhancing the reasoning capabilities of diffusion large language models (dLLMs). However, existing dLLM policy optimization methods suffer from two critical reliability bottlenecks: (1) reward…

Computation and Language · Computer Science 2026-05-14 Leyi Pan , Shuchang Tao , Yunpeng Zhai , Zheyu Fu , Liancheng Fang , Minghua He , Lingzhe Zhang , Zhaoyang Liu , Bolin Ding , Aiwei Liu , Lijie Wen

In this work, we provide a systematic survey of Discrete Diffusion Language Models (dLLMs) and Discrete Diffusion Multimodal Language Models (dMLLMs). Unlike autoregressive (AR) models, dLLMs and dMLLMs adopt a multi-token, parallel…

Machine Learning · Computer Science 2025-09-22 Runpeng Yu , Qi Li , Xinchao Wang

Diffusion-based policies have gained growing popularity in solving a wide range of decision-making tasks due to their superior expressiveness and controllable generation during inference. However, effectively training large diffusion…

‹ Prev 1 2 3 10 Next ›