English
Related papers

Related papers: Understanding Likelihood Over-optimisation in Dire…

200 papers

Direct Alignment Algorithms (DAAs) such as Direct Preference Optimization (DPO) have emerged as alternatives to the standard Reinforcement Learning from Human Feedback (RLHF) for aligning large language models (LLMs) with human values.…

Machine Learning · Computer Science 2025-06-12 Phuc Minh Nguyen , Ngoc-Hieu Nguyen , Duy H. M. Nguyen , Anji Liu , An Mai , Binh T. Nguyen , Daniel Sonntag , Khoa D. Doan

Direct Preference Optimization (DPO) and its variants have become the de facto standards for aligning large language models (LLMs) with human preferences or specific goals. However, DPO requires high-quality preference data and suffers from…

Machine Learning · Computer Science 2024-11-12 Zhuotong Chen , Fang Liu , Jennifer Zhu , Wanyu Du , Yanjun Qi

The increasing capabilities of large language models (LLMs) raise opportunities for artificial general intelligence but concurrently amplify safety concerns, such as potential misuse of AI systems, necessitating effective AI alignment.…

Machine Learning · Computer Science 2023-09-29 Chaoqi Wang , Yibo Jiang , Chenghao Yang , Han Liu , Yuxin Chen

Aligning the output of Large Language Models (LLMs) with human preferences (e.g., by means of reinforcement learning with human feedback, or RLHF) is essential for ensuring their effectiveness in real-world scenarios. Despite significant…

Artificial Intelligence · Computer Science 2024-10-23 Pietro Bernardelle , Gianluca Demartini

With the rapid development and widespread application of Large Language Models (LLMs), their potential safety risks have attracted widespread attention. Reinforcement Learning from Human Feedback (RLHF) has been adopted to enhance the…

Artificial Intelligence · Computer Science 2026-03-25 Shiji Zhao , Mengyang Wang , Shukun Xiong , Fangzhou Chen , Qihui Zhu , Shouwei Ruan , Yisong Xiao , Ranjie Duan , Xun Chen , XingXing Wei

Aligning large language models (LLMs) with human preferences has become essential for safe and beneficial AI deployment. While Reinforcement Learning from Human Feedback (RLHF) established the dominant paradigm, a proliferation of…

Artificial Intelligence · Computer Science 2026-01-13 Tarun Raheja , Nilay Pochhi

Direct Preference Optimization (DPO) has emerged as a popular alternative to Reinforcement Learning from Human Feedback (RLHF), offering theoretical equivalence with simpler implementation. We prove this equivalence is conditional rather…

Artificial Intelligence · Computer Science 2026-05-21 Zhiqin Yang , Yonggang Zhang , Wei Xue , Dong Fang , Bo Han , Yike Guo

Direct Preference Optimization (DPO) has emerged as a promising approach for aligning large language models with human preferences. While prior work mainly extends DPO from the aspect of the objective function, we instead improve DPO from…

Machine Learning · Computer Science 2026-02-17 Xun Deng , Han Zhong , Rui Ai , Fuli Feng , Zheng Wang , Xiangnan He

Large Language Models (LLMs) have demonstrated unprecedented generative capabilities, yet their alignment with human values remains critical for ensuring helpful and harmless deployments. While Reinforcement Learning from Human Feedback…

Direct Preference Optimization (DPO), which derives reward signals directly from pairwise preference data, has shown its effectiveness on aligning Large Language Models (LLMs) with human preferences. Despite its widespread use across…

Computation and Language · Computer Science 2024-04-09 Duanyu Feng , Bowen Qin , Chen Huang , Zheng Zhang , Wenqiang Lei

Direct Preference Optimization (DPO) improves the alignment of large language models (LLMs) with human values by training directly on human preference datasets, eliminating the need for reward models. However, due to the presence of…

Artificial Intelligence · Computer Science 2024-06-11 Biqing Qi , Pengfei Li , Fangyuan Li , Junqi Gao , Kaiyan Zhang , Bowen Zhou

In the field of large language models (LLMs), aligning models with the diverse preferences of users is a critical challenge. Direct Preference Optimization (DPO) has played a key role in this area. It works by using pairs of preferences…

Computation and Language · Computer Science 2024-05-29 Yueqin Yin , Zhendong Wang , Yi Gu , Hai Huang , Weizhu Chen , Mingyuan Zhou

Reinforcement Learning from Human Feedback (RLHF) has been crucial to the recent success of Large Language Models (LLMs), however, it is often a complex and brittle process. In the classical RLHF framework, a reward model is first trained…

Machine Learning · Computer Science 2024-11-06 Rafael Rafailov , Yaswanth Chittepu , Ryan Park , Harshit Sikchi , Joey Hejna , Bradley Knox , Chelsea Finn , Scott Niekum

Recent alignment methods based on Direct Preference Optimization (DPO) reformulate preference learning as supervised optimization over pairwise comparisons, offering improved efficiency and stability over reinforcement learning from human…

Machine Learning · Computer Science 2026-01-22 Yuhui Sun , Xiyao Wang , Zixi Li , YiTian Ding , Tianyang Ling , Jialuo Chen , Tianyi Yu , Zhenlong Yuan , Jinman Zhao

With the rapid advancement of large language models (LLMs), aligning policy models with human preferences has become increasingly critical. Direct Preference Optimization (DPO) has emerged as a promising approach for alignment, acting as an…

Artificial Intelligence · Computer Science 2025-07-15 Wenyi Xiao , Zechuan Wang , Leilei Gan , Shuai Zhao , Zongrui Li , Ruirui Lei , Wanggui He , Luu Anh Tuan , Long Chen , Hao Jiang , Zhou Zhao , Fei Wu

Direct Preference Optimization (DPO) simplifies reinforcement learning from human feedback (RLHF) for large language models (LLMs) by directly optimizing human preferences without an explicit reward model. We find that during DPO training,…

Computation and Language · Computer Science 2026-01-01 Junshu Pan , Wei Shen , Shulin Huang , Qiji Zhou , Yue Zhang

Direct Preference Optimization (DPO) is a widely used reinforcement learning from human feedback (RLHF) method across various domains. Recent research has increasingly focused on the role of token importance in improving DPO effectiveness.…

Computation and Language · Computer Science 2025-12-01 Jian Li , Shenglin Yin , Yujia Zhang , Alan Zhao , Xi Chen , Xiaohui Zhou , Pengfei Xu

Reinforcement Learning with Human Feedback (RLHF) and its variants have made huge strides toward the effective alignment of large language models (LLMs) to follow instructions and reflect human values. More recently, Direct Alignment…

Computation and Language · Computer Science 2025-06-02 Aman Gupta , Shao Tang , Qingquan Song , Sirou Zhu , Jiwoo Hong , Ankan Saha , Viral Gupta , Noah Lee , Eunki Kim , Siyu Zhu , Parag Agrawal , Natesh Pillai , S. Sathiya Keerthi

Modern alignment pipelines are increasingly replacing expensive human preference labels with evaluations from large language models (LLM-as-Judge). However, AI labels can be systematically biased compared to high-quality human feedback…

Machine Learning · Statistics 2026-02-10 Xintao Xia , Zhiqiu Xia , Linjun Zhang , Zhanrui Cai

Direct Preference Optimization (DPO) has shown strong potential for mitigating hallucinations in Multimodal Large Language Models (MLLMs). However, existing multimodal DPO approaches often suffer from overfitting due to the difficulty…

Artificial Intelligence · Computer Science 2026-01-05 Longtian Qiu , Shan Ning , Chuyu Zhang , Jiaxuan Sun , Xuming He
‹ Prev 1 2 3 10 Next ›