English
Related papers

Related papers: Revolutionizing Reinforcement Learning Framework f…

200 papers

Diffusion Language Models (dLLMs) have emerged as promising alternatives to Auto-Regressive (AR) models. While recent efforts have validated their pre-training potential and accelerated inference speeds, the post-training landscape for…

Machine Learning · Computer Science 2026-01-07 Ying Zhu , Jiaxin Wan , Xiaoran Liu , Siyang He , Qiqi Wang , Xu Guo , Tianyi Liang , Zengfeng Huang , Ziwei He , Xipeng Qiu

Diffusion language models are a promising alternative to autoregressive models, yet post-training methods for them largely adapt reward-maximizing objectives. We identify a central failure mode in this setting we call trajectory locking:…

Machine Learning · Computer Science 2026-05-15 Saba Ahmadi , Prasanna Parthasarathi , Yufei Cui

Diffusion Large Language Models (dLLMs) are rapidly emerging alongside autoregressive models as a powerful paradigm for complex reasoning, with reinforcement learning increasingly used for downstream alignment. Existing trajectory-based RL…

Machine Learning · Computer Science 2025-11-20 Ranfei Chen , Ming Chen , Kaifei Wang

Recent large language models (LLMs) have demonstrated strong reasoning capabilities that benefits from online reinforcement learning (RL). These capabilities have primarily been demonstrated within the left-to-right autoregressive (AR)…

Computation and Language · Computer Science 2025-06-04 Siyan Zhao , Devaansh Gupta , Qinqing Zheng , Aditya Grover

In this work, we propose Dimple, the first Discrete Diffusion Multimodal Large Language Model (DMLLM). We observe that training with a purely discrete diffusion approach leads to significant training instability, suboptimal performance, and…

Computer Vision and Pattern Recognition · Computer Science 2025-05-27 Runpeng Yu , Xinyin Ma , Xinchao Wang

We propose DiFFPO, Diffusion Fast and Furious Policy Optimization, a unified framework for training masked diffusion large language models (dLLMs) to reason not only better (furious), but also faster via reinforcement learning (RL). We…

Machine Learning · Computer Science 2026-01-13 Hanyang Zhao , Dawen Liang , Wenpin Tang , David Yao , Nathan Kallus

Diffusion Models (DMs), as a leading class of generative models, offer key advantages for reinforcement learning (RL), including multi-modal expressiveness, stable training, and trajectory-level planning. This survey delivers a…

Machine Learning · Computer Science 2025-10-15 Changfu Xu , Jianxiong Guo , Yuzhu Liang , Haiyang Huang , Haodong Zou , Xi Zheng , Shui Yu , Xiaowen Chu , Jiannong Cao , Tian Wang

Diffusion large language models (dLLMs) are emerging as a compelling alternative to dominant autoregressive models, replacing strictly sequential token generation with iterative denoising and parallel generation dynamics. However, their…

Computation and Language · Computer Science 2026-04-07 Jingyi Yang , Yuxian Jiang , Xuhao Hu , Shuang Cheng , Biqing Qi , Jing Shao

Large language model (LLM)-based embedding models, benefiting from large scale pre-training and post-training, have begun to surpass BERT and T5-based models on general-purpose text embedding tasks such as document retrieval. However, a…

Computation and Language · Computer Science 2025-05-22 Siyue Zhang , Yilun Zhao , Liyuan Geng , Arman Cohan , Anh Tuan Luu , Chen Zhao

Reinforcement learning has become a central paradigm for improving LLM reasoning, but most existing methods optimize policies over discrete token sequences. This creates a mismatch between the optimization space and the structure of…

Machine Learning · Computer Science 2026-05-19 Haoqiang Kang , Yizhe Zhang , Nikki Lijing Kuang , Yi-An Ma , Lianhui Qin

Recent research has highlighted the powerful capabilities of imitation learning in robotics. Leveraging generative models, particularly diffusion models, these approaches offer notable advantages such as strong multi-task generalization,…

Robotics · Computer Science 2025-09-15 Xinyao Qin , Xiaoteng Ma , Yang Qi , Qihan Liu , Chuanyi Xue , Ning Gui , Qinyu Dong , Jun Yang , Bin Liang

Reinforcement learning (RL) has been effective for post-training autoregressive (AR) language models, but extending these methods to diffusion language models (DLMs) is challenging due to intractable sequence-level likelihoods. Existing…

Diffusion models have shown strong competitiveness in offline reinforcement learning tasks by formulating decision-making as sequential generation. However, the practicality of these methods is limited due to the lengthy inference processes…

Machine Learning · Computer Science 2024-07-24 Renming Huang , Yunqiang Pei , Guoqing Wang , Yangming Zhang , Yang Yang , Peng Wang , Hengtao Shen

Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling, potentially addressing limitations of autoregressive (AR) models. However, current DLMs have been studied at a smaller scale compared to…

Computation and Language · Computer Science 2025-06-03 Shansan Gong , Shivam Agarwal , Yizhe Zhang , Jiacheng Ye , Lin Zheng , Mukai Li , Chenxin An , Peilin Zhao , Wei Bi , Jiawei Han , Hao Peng , Lingpeng Kong

Discrete diffusion models have recently emerged as strong alternatives to autoregressive language models, matching their performance through large-scale training. However, inference-time control remains relatively underexplored. In this…

Machine Learning · Computer Science 2026-04-09 Meihua Dang , Jiaqi Han , Minkai Xu , Kai Xu , Akash Srivastava , Stefano Ermon

Diffusion Large Language Models (dLLMs) introduce a new paradigm for language generation, which in turn presents new challenges for aligning them with human preferences. In this work, we aim to improve the policy optimization for dLLMs by…

Diffusion large language models (dLLMs) have emerged as a new architecture following auto regressive models. Their denoising process offers a powerful generative advantage, but they present significant challenges in learning and…

Machine Learning · Computer Science 2025-09-24 Ranfei Chen , Ming Chen

Thanks to their remarkable flexibility, diffusion models and flow models have emerged as promising candidates for policy representation. However, efficient reinforcement learning (RL) upon these policies remains a challenge due to the lack…

Machine Learning · Computer Science 2026-03-31 Chenxiao Gao , Edward Chen , Tianyi Chen , Bo Dai

Aligning generative diffusion models with human preferences via reinforcement learning (RL) is critical yet challenging. Most existing algorithms are often vulnerable to reward hacking, such as quality degradation, over-stylization, or…

Diffusion language models (DLMs) offer a promising path toward low-latency generation through parallel decoding, but their practical efficiency depends heavily on the decoding trajectory. In practice, this advantage often fails to fully…

Computation and Language · Computer Science 2026-04-02 Lingjie Chen , Ruizhong Qiu , Yuyu Fan , Yanjun Zhao , Hanghang Tong
‹ Prev 1 2 3 10 Next ›