Related papers: MPO: An Efficient Post-Processing Framework for Mi…

Mixed Preference Optimization: Reinforcement Learning with Data Selection and Better Reference Model

Large Language Models (LLMs) have become increasingly popular due to their ability to process and generate natural language. However, as they are trained on massive datasets of text, LLMs can inherit harmful biases and produce outputs that…

Computation and Language · Computer Science 2025-01-23 Qi Gou , Cam-Tu Nguyen

Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization

A single language model, even when aligned with labelers through reinforcement learning from human feedback (RLHF), may not suit all human preferences. Recent approaches therefore prefer customization, gathering multi-dimensional feedback,…

Machine Learning · Computer Science 2024-08-20 Zhanhui Zhou , Jie Liu , Jing Shao , Xiangyu Yue , Chao Yang , Wanli Ouyang , Yu Qiao

MPPO: Multi Pair-wise Preference Optimization for LLMs with Arbitrary Negative Samples

Aligning Large Language Models (LLMs) with human feedback is crucial for their development. Existing preference optimization methods such as DPO and KTO, while improved based on Reinforcement Learning from Human Feedback (RLHF), are…

Computation and Language · Computer Science 2024-12-23 Shuo Xie , Fangzhi Zhu , Jiahui Wang , Lulu Wen , Wei Dai , Xiaowei Chen , Junxiong Zhu , Kai Zhou , Bo Zheng

Unified Preference Optimization: Language Model Alignment Beyond the Preference Frontier

For aligning large language models (LLMs), prior work has leveraged reinforcement learning via human feedback (RLHF) or variations of direct preference optimization (DPO). While DPO offers a simpler framework based on maximum likelihood…

Artificial Intelligence · Computer Science 2025-05-27 Anirudhan Badrinath , Prabhat Agarwal , Jiajing Xu

Multi-Objective Preference Optimization: Improving Human Alignment of Generative Models

Post-training of LLMs with RLHF, and subsequently preference optimization algorithms such as DPO, IPO, etc., made a big difference in improving human alignment. However, all such techniques can only work with a single (human) objective. In…

Machine Learning · Computer Science 2025-05-19 Akhil Agnihotri , Rahul Jain , Deepak Ramachandran , Zheng Wen

Preference as Reward, Maximum Preference Optimization with Importance Sampling

Preference learning is a key technology for aligning language models with human values. Reinforcement Learning from Human Feedback (RLHF) is a model-based algorithm to optimize preference learning, which first fits a reward model for…

Machine Learning · Computer Science 2024-03-26 Zaifan Jiang , Xing Huang , Chao Wei

Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models

Reinforcement Learning from Human Feedback (RLHF) has emerged as a powerful technique for aligning large language models (LLMs) with human preferences. However, effectively aligning LLMs with diverse human preferences remains a significant…

Computation and Language · Computer Science 2025-07-03 Chengao Li , Hanyu Zhang , Yunkun Xu , Hongyan Xue , Xiang Ao , Qing He

Multiplayer Nash Preference Optimization

Reinforcement learning from human feedback (RLHF) has emerged as the standard paradigm for aligning large language models with human preferences. However, reward-based methods grounded in the Bradley-Terry assumption struggle to capture the…

Artificial Intelligence · Computer Science 2026-04-08 Fang Wu , Xu Huang , Weihao Xuan , Zhiwei Zhang , Yijia Xiao , Guancheng Wan , Xiaomin Li , Bing Hu , Peng Xia , Jure Leskovec , Yejin Choi

Learning Reward and Policy Jointly from Demonstration and Preference Improves Alignment

Aligning human preference and value is an important requirement for building contemporary foundation models and embodied AI. However, popular approaches such as reinforcement learning with human feedback (RLHF) break down the task into…

Artificial Intelligence · Computer Science 2024-12-03 Chenliang Li , Siliang Zeng , Zeyi Liao , Jiaxiang Li , Dongyeop Kang , Alfredo Garcia , Mingyi Hong

PMoL: Parameter Efficient MoE for Preference Mixing of LLM Alignment

Reinforcement Learning from Human Feedback (RLHF) has been proven to be an effective method for preference alignment of large language models (LLMs) and is widely used in the post-training process of LLMs. However, RLHF struggles with…

Computation and Language · Computer Science 2024-11-05 Dongxu Liu , Bing Xu , Yinzhuo Chen , Bufan Xu , Wenpeng Lu , Muyun Yang , Tiejun Zhao

Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF

Reinforcement learning from human feedback (RLHF) has demonstrated great promise in aligning large language models (LLMs) with human preference. Depending on the availability of preference data, both online and offline RLHF are active areas…

Machine Learning · Computer Science 2025-02-20 Shicong Cen , Jincheng Mei , Katayoon Goshvadi , Hanjun Dai , Tong Yang , Sherry Yang , Dale Schuurmans , Yuejie Chi , Bo Dai

Projection Optimization: A General Framework for Multi-Objective and Multi-Group RLHF

Reinforcement Learning with Human Feedback (RLHF) is a widely used fine-tuning approach that aligns machine learning model, particularly Language Model (LM) with human preferences. There are typically multiple objectives driving the…

Machine Learning · Computer Science 2025-02-25 Nuoya Xiong , Aarti Singh

PEO: Improving Bi-Factorial Preference Alignment with Post-Training Policy Extrapolation

The alignment of large language models with human values presents a critical challenge, particularly when balancing conflicting objectives like helpfulness and harmlessness. Existing approaches, such as Reinforcement Learning from Human…

Computation and Language · Computer Science 2025-03-04 Yuxuan Liu

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

AI alignment in the shape of Reinforcement Learning from Human Feedback (RLHF) is increasingly treated as a crucial ingredient for high performance large language models. Proximal Policy Optimization (PPO) has been positioned by recent…

Machine Learning · Computer Science 2024-02-27 Arash Ahmadian , Chris Cremer , Matthias Gallé , Marzieh Fadaee , Julia Kreutzer , Olivier Pietquin , Ahmet Üstün , Sara Hooker

Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging

While Reinforcement Learning from Human Feedback (RLHF) aligns Large Language Models (LLMs) with general, aggregate human preferences, it is suboptimal for learning diverse, individual perspectives. In this work, we study Reinforcement…

Computation and Language · Computer Science 2023-10-19 Joel Jang , Seungone Kim , Bill Yuchen Lin , Yizhong Wang , Jack Hessel , Luke Zettlemoyer , Hannaneh Hajishirzi , Yejin Choi , Prithviraj Ammanabrolu

Hybrid Preference Optimization for Alignment: Provably Faster Convergence Rates by Combining Offline Preferences with Online Exploration

Reinforcement Learning from Human Feedback (RLHF) is currently the leading approach for aligning large language models with human preferences. Typically, these models rely on extensive offline preference datasets for training. However,…

Machine Learning · Computer Science 2024-12-17 Avinandan Bose , Zhihan Xiong , Aadirupa Saha , Simon Shaolei Du , Maryam Fazel

Beyond Pairwise: Empowering LLM Alignment With Ranked Choice Modeling

Alignment of large language models (LLMs) has predominantly relied on pairwise preference optimization, where annotators select the better of two responses to a prompt. While simple, this approach overlooks the opportunity to learn from…

Machine Learning · Computer Science 2026-02-11 Yuxuan Tang , Yifan Feng

Direct Preference Optimization With Unobserved Preference Heterogeneity: The Necessity of Ternary Preferences

Reinforcement Learning from Human Feedback (RLHF) has become central to aligning large language models with human values, typically by first learning a reward model from preference data which is then used to update the model with…

Machine Learning · Computer Science 2025-10-21 Keertana Chidambaram , Karthik Vinay Seetharaman , Vasilis Syrgkanis

Direct Preference Optimization with Unobserved Preference Heterogeneity: The Necessity of Ternary Preferences

Reinforcement Learning from Human Feedback (RLHF) has become central to aligning large language models with human values, typically by first learning a reward model from preference data which is then used to update the model with…

Artificial Intelligence · Computer Science 2025-10-20 Keertana Chidambaram , Karthik Vinary Seetharaman , Vasilis Syrgkanis

A Survey of Direct Preference Optimization

Large Language Models (LLMs) have demonstrated unprecedented generative capabilities, yet their alignment with human values remains critical for ensuring helpful and harmless deployments. While Reinforcement Learning from Human Feedback…

Machine Learning · Computer Science 2025-03-18 Shunyu Liu , Wenkai Fang , Zetian Hu , Junjie Zhang , Yang Zhou , Kongcheng Zhang , Rongcheng Tu , Ting-En Lin , Fei Huang , Mingli Song , Yongbin Li , Dacheng Tao