Related papers: Filtered Direct Preference Optimization

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

While large-scale unsupervised language models (LMs) learn broad world knowledge and some reasoning skills, achieving precise control of their behavior is difficult due to the completely unsupervised nature of their training. Existing…

Machine Learning · Computer Science 2024-07-31 Rafael Rafailov , Archit Sharma , Eric Mitchell , Stefano Ermon , Christopher D. Manning , Chelsea Finn

New Desiderata for Direct Preference Optimization

Large language models in the past have typically relied on some form of reinforcement learning with human feedback (RLHF) to better align model responses with human preferences. However, because of oft-observed instabilities when…

Computation and Language · Computer Science 2024-07-15 Xiangkun Hu , Tong He , David Wipf

Active Learning for Direct Preference Optimization

Direct preference optimization (DPO) is a form of reinforcement learning from human feedback (RLHF) where the policy is learned directly from preferential feedback. Although many models of human preferences exist, the critical task of…

Machine Learning · Computer Science 2025-03-04 Branislav Kveton , Xintong Li , Julian McAuley , Ryan Rossi , Jingbo Shang , Junda Wu , Tong Yu

Optimizing LLMs with Direct Preferences: A Data Efficiency Perspective

Aligning the output of Large Language Models (LLMs) with human preferences (e.g., by means of reinforcement learning with human feedback, or RLHF) is essential for ensuring their effectiveness in real-world scenarios. Despite significant…

Artificial Intelligence · Computer Science 2024-10-23 Pietro Bernardelle , Gianluca Demartini

A Survey of Direct Preference Optimization

Large Language Models (LLMs) have demonstrated unprecedented generative capabilities, yet their alignment with human values remains critical for ensuring helpful and harmless deployments. While Reinforcement Learning from Human Feedback…

Machine Learning · Computer Science 2025-03-18 Shunyu Liu , Wenkai Fang , Zetian Hu , Junjie Zhang , Yang Zhou , Kongcheng Zhang , Rongcheng Tu , Ting-En Lin , Fei Huang , Mingli Song , Yongbin Li , Dacheng Tao

Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model

Using reinforcement learning with human feedback (RLHF) has shown significant promise in fine-tuning diffusion models. Previous methods start by training a reward model that aligns with human preferences, then leverage RL techniques to…

Machine Learning · Computer Science 2024-03-26 Kai Yang , Jian Tao , Jiafei Lyu , Chunjiang Ge , Jiaxin Chen , Qimai Li , Weihan Shen , Xiaolong Zhu , Xiu Li

Enhancing LLM Safety via Constrained Direct Preference Optimization

The rapidly increasing capabilities of large language models (LLMs) raise an urgent need to align AI systems with diverse human preferences to simultaneously enhance their usefulness and safety, despite the often conflicting nature of these…

Machine Learning · Computer Science 2024-03-06 Zixuan Liu , Xiaolin Sun , Zizhan Zheng

Direct Alignment of Language Models via Quality-Aware Self-Refinement

Reinforcement Learning from Human Feedback (RLHF) has been commonly used to align the behaviors of Large Language Models (LLMs) with human preferences. Recently, a popular alternative is Direct Policy Optimization (DPO), which replaces an…

Computation and Language · Computer Science 2024-06-03 Runsheng Yu , Yong Wang , Xiaoqi Jiao , Youzhi Zhang , James T. Kwok

Disentangling Length from Quality in Direct Preference Optimization

Reinforcement Learning from Human Feedback (RLHF) has been a crucial component in the recent success of Large Language Models. However, RLHF is know to exploit biases in human preferences, such as verbosity. A well-formatted and eloquent…

Computation and Language · Computer Science 2024-09-10 Ryan Park , Rafael Rafailov , Stefano Ermon , Chelsea Finn

A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications

With the rapid advancement of large language models (LLMs), aligning policy models with human preferences has become increasingly critical. Direct Preference Optimization (DPO) has emerged as a promising approach for alignment, acting as an…

Artificial Intelligence · Computer Science 2025-07-15 Wenyi Xiao , Zechuan Wang , Leilei Gan , Shuai Zhao , Zongrui Li , Ruirui Lei , Wanggui He , Luu Anh Tuan , Long Chen , Hao Jiang , Zhou Zhao , Fei Wu

Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model

Direct Preference Optimization (DPO) simplifies reinforcement learning from human feedback (RLHF) for large language models (LLMs) by directly optimizing human preferences without an explicit reward model. We find that during DPO training,…

Computation and Language · Computer Science 2026-01-01 Junshu Pan , Wei Shen , Shulin Huang , Qiji Zhou , Yue Zhang

Direct Preference Optimization With Unobserved Preference Heterogeneity: The Necessity of Ternary Preferences

Reinforcement Learning from Human Feedback (RLHF) has become central to aligning large language models with human values, typically by first learning a reward model from preference data which is then used to update the model with…

Machine Learning · Computer Science 2025-10-21 Keertana Chidambaram , Karthik Vinay Seetharaman , Vasilis Syrgkanis

FedPDPO: Federated Personalized Direct Preference Optimization for Large Language Model Alignment

Aligning large language models (LLMs) with human preferences in federated learning (FL) is challenging due to decentralized, privacy-sensitive, and highly non-IID preference data. Direct Preference Optimization (DPO) offers an efficient…

Machine Learning · Computer Science 2026-03-23 Kewen Zhu , Liping Yi , Zhiming Zhao , Zhuang Qi , Han Yu , Qinghua Hu

Direct Preference Optimization with Unobserved Preference Heterogeneity: The Necessity of Ternary Preferences

Reinforcement Learning from Human Feedback (RLHF) has become central to aligning large language models with human values, typically by first learning a reward model from preference data which is then used to update the model with…

Artificial Intelligence · Computer Science 2025-10-20 Keertana Chidambaram , Karthik Vinary Seetharaman , Vasilis Syrgkanis

Mixed Preference Optimization: Reinforcement Learning with Data Selection and Better Reference Model

Large Language Models (LLMs) have become increasingly popular due to their ability to process and generate natural language. However, as they are trained on massive datasets of text, LLMs can inherit harmful biases and produce outputs that…

Computation and Language · Computer Science 2025-01-23 Qi Gou , Cam-Tu Nguyen

Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization

Reinforcement Learning from Human Feedback (RLHF) and derivative techniques like Direct Preference Optimization (DPO) are task-alignment algorithms used to repurpose general, foundational models for specific tasks. We show that applying…

Computation and Language · Computer Science 2025-09-30 Kaden Uhlig , Joern Wuebker , Raphael Reinauer , John DeNero

Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences

In this paper, we take a step towards a deeper understanding of learning from human preferences by systematically comparing the paradigm of reinforcement learning from human feedback (RLHF) with the recently proposed paradigm of direct…

Machine Learning · Computer Science 2024-06-06 Andi Nika , Debmalya Mandal , Parameswaran Kamalaruban , Georgios Tzannetos , Goran Radanović , Adish Singla

Accelerated Preference Optimization for Large Language Model Alignment

Reinforcement Learning from Human Feedback (RLHF) has emerged as a pivotal tool for aligning large language models (LLMs) with human preferences. Direct Preference Optimization (DPO), one of the most popular approaches, formulates RLHF as a…

Machine Learning · Computer Science 2024-10-10 Jiafan He , Huizhuo Yuan , Quanquan Gu

CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation

Large language models (LLMs) have shown great potential in natural language processing tasks, but their application to machine translation (MT) remains challenging due to pretraining on English-centric data and the complexity of…

Computation and Language · Computer Science 2025-01-24 Guofeng Cui , Pichao Wang , Yang Liu , Zemian Ke , Zhu Liu , Vimal Bhat

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

Reinforcement Learning from Human Feedback (RLHF) is currently the most widely used method to align large language models (LLMs) with human preferences. Existing RLHF methods can be roughly categorized as either reward-based or reward-free.…

Computation and Language · Computer Science 2024-10-11 Shusheng Xu , Wei Fu , Jiaxuan Gao , Wenjie Ye , Weilin Liu , Zhiyu Mei , Guangju Wang , Chao Yu , Yi Wu