English
Related papers

Related papers: Reinforcing Diffusion Models by Direct Group Prefe…

200 papers

Recently, reinforcement learning (RL) has been employed for improving generative image super-resolution (ISR) performance. However, the current efforts are focused on multi-step generative ISR, while one-step generative ISR remains…

Computer Vision and Pattern Recognition · Computer Science 2026-03-18 Qiaosi Yi , Shuai Li , Rongyuan Wu , Lingchen Sun , Zhengqiang Zhang , Lei Zhang

Aligning text-to-image (T2I) diffusion models with Direct Preference Optimization (DPO) has shown notable improvements in generation quality. However, applying DPO to T2I faces two challenges: the sensitivity of DPO to preference pairs and…

Computer Vision and Pattern Recognition · Computer Science 2025-05-19 Renjie Chen , Wenfeng Lin , Yichen Zhang , Jiangchuan Wei , Boyuan Liu , Chao Feng , Jiao Ran , Mingyu Guo

Direct preference optimization (DPO) methods have shown strong potential in aligning text-to-image diffusion models with human preferences by training on paired comparisons. These methods improve training stability by avoiding the REINFORCE…

Computer Vision and Pattern Recognition · Computer Science 2025-10-22 Yi-Lun Wu , Bo-Kai Ruan , Chiang Tseng , Hong-Han Shuai

The incorporation of online reinforcement learning (RL) into diffusion and flow-based generative models has recently gained attention as a powerful paradigm for aligning model behavior with human preferences. By leveraging stochastic…

Machine Learning · Computer Science 2025-11-25 Yujie Zhou , Pengyang Ling , Jiazi Bu , Yibin Wang , Yuhang Zang , Jiaqi Wang , Li Niu , Guangtao Zhai

Aligning large language models with human preferences has emerged as a critical focus in language modeling research. Yet, integrating preference learning into Text-to-Image (T2I) generative models is still relatively uncharted territory.…

Computer Vision and Pattern Recognition · Computer Science 2024-06-11 Yi Gu , Zhendong Wang , Yueqin Yin , Yujia Xie , Mingyuan Zhou

Preference-based reinforcement learning (RL) is a key paradigm for aligning policies with human judgments, yet its theoretical behavior in distributed settings where preference data are fragmented across heterogeneous users remains poorly…

Machine Learning · Computer Science 2026-05-21 Zhanhong Jiang

Diffusion models have achieved state-of-the-art performance across multiple domains, with recent advancements extending their applicability to discrete data. However, aligning discrete diffusion models with task-specific preferences remains…

Machine Learning · Computer Science 2025-04-10 Umberto Borso , Davide Paglieri , Jude Wells , Tim Rocktäschel

Diffusion models are a class of flexible generative models trained with an approximation to the log-likelihood objective. However, most use cases of diffusion models are not concerned with likelihoods, but instead with downstream objectives…

Machine Learning · Computer Science 2024-01-08 Kevin Black , Michael Janner , Yilun Du , Ilya Kostrikov , Sergey Levine

Radiography Report Generation (RRG) has gained significant attention in medical image analysis as a promising tool for alleviating the growing workload of radiologists. However, despite numerous advancements, existing methods have yet to…

Computer Vision and Pattern Recognition · Computer Science 2025-09-29 Valentin Samokhin , Boris Shirokikh , Mikhail Goncharov , Dmitriy Umerenkov , Maksim Bobrin , Ivan Oseledets , Dmitry Dylov , Mikhail Belyaev

Diffusion language models (DLMs) enable parallel, order-agnostic generation with iterative refinement, offering a flexible alternative to autoregressive large language models (LLMs). However, adapting reinforcement learning (RL) fine-tuning…

Machine Learning · Computer Science 2026-02-12 Kevin Rojas , Jiahe Lin , Kashif Rasul , Anderson Schneider , Yuriy Nevmyvaka , Molei Tao , Wei Deng

Language model (LM) post-training (or alignment) involves maximizing a reward function that is derived from preference annotations. Direct Preference Optimization (DPO) is a popular offline alignment method that trains a policy directly on…

Machine Learning · Computer Science 2025-03-04 Adam Fisch , Jacob Eisenstein , Vicky Zayats , Alekh Agarwal , Ahmad Beirami , Chirag Nagpal , Pete Shaw , Jonathan Berant

We introduce Diffusion Policy Policy Optimization, DPPO, an algorithmic framework including best practices for fine-tuning diffusion-based policies (e.g. Diffusion Policy) in continuous control and robot learning tasks using the policy…

Direct Preference Optimization (DPO) guides large language models (LLMs) to generate recommendations aligned with user historical behavior distributions by minimizing preference alignment loss. However, our systematic empirical research and…

Information Retrieval · Computer Science 2026-05-28 Chu Zhao , Enneng Yang , Jianzhe Zhao , Guibing Guo

Diffusion-based policies have gained growing popularity in solving a wide range of decision-making tasks due to their superior expressiveness and controllable generation during inference. However, effectively training large diffusion…

Direct Preference Optimization (DPO) has been successfully used to align large language models (LLMs) according to human preferences, and more recently it has also been applied to improving the quality of text-to-image diffusion models.…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Shivanshu Shekhar , Shreyas Singh , Tong Zhang

Group Relative Policy Optimization (GRPO) is highly effective for post-training autoregressive (AR) language models, yet its direct application to diffusion large language models (dLLMs) often triggers reward collapse. We identify two…

Machine Learning · Computer Science 2026-03-10 Jianyuan Zhong , Kaibo Wang , Ding Ding , Zijin Feng , Haoli Bai , Yang Xiang , Jiacheng Sun , Qiang Xu

This study addresses the challenge of noise in training datasets for Direct Preference Optimization (DPO), a method for aligning Large Language Models (LLMs) with human preferences. We categorize noise into pointwise noise, which includes…

Machine Learning · Computer Science 2025-04-21 Junkang Wu , Yuexiang Xie , Zhengyi Yang , Jiancan Wu , Jiawei Chen , Jinyang Gao , Bolin Ding , Xiang Wang , Xiangnan He

Policy-based Reinforcement Learning (RL) has established itself as the dominant paradigm in generative recommendation for optimizing sequential user interactions. However, when applied to offline historical logs, these methods suffer a…

Machine Learning · Computer Science 2026-02-12 Jie Jiang , Yusen Huo , Xiangxin Zhan , Changping Wang , Jun Zhang

Direct Preference Optimization (DPO) has become a popular method for fine-tuning large language models (LLMs) due to its stability and simplicity. However, it is also known to be sensitive to noise in the data and prone to overfitting.…

Machine Learning · Computer Science 2025-10-28 Cheol Woo Kim , Shresth Verma , Mauricio Tec , Milind Tambe

Direct Preference Optimization (DPO) has been proposed as an effective and efficient alternative to reinforcement learning from human feedback (RLHF). In this paper, we propose a novel and enhanced version of DPO based on curriculum…

Computer Vision and Pattern Recognition · Computer Science 2025-05-12 Florinel-Alin Croitoru , Vlad Hondru , Radu Tudor Ionescu , Nicu Sebe , Mubarak Shah
‹ Prev 1 2 3 10 Next ›