Related papers: Controllable Preference Optimization: Toward Contr…

Multi-Objective Preference Optimization: Improving Human Alignment of Generative Models

Post-training of LLMs with RLHF, and subsequently preference optimization algorithms such as DPO, IPO, etc., made a big difference in improving human alignment. However, all such techniques can only work with a single (human) objective. In…

Machine Learning · Computer Science 2025-05-19 Akhil Agnihotri , Rahul Jain , Deepak Ramachandran , Zheng Wen

MPO: An Efficient Post-Processing Framework for Mixing Diverse Preference Alignment

Reinforcement Learning from Human Feedback (RLHF) has shown promise in aligning large language models (LLMs). Yet its reliance on a singular reward model often overlooks the diversity of human preferences. Recent approaches address this…

Computation and Language · Computer Science 2025-07-23 Tianze Wang , Dongnan Gui , Yifan Hu , Shuhang Lin , Linjun Zhang

Unified Preference Optimization: Language Model Alignment Beyond the Preference Frontier

For aligning large language models (LLMs), prior work has leveraged reinforcement learning via human feedback (RLHF) or variations of direct preference optimization (DPO). While DPO offers a simpler framework based on maximum likelihood…

Artificial Intelligence · Computer Science 2025-05-27 Anirudhan Badrinath , Prabhat Agarwal , Jiajing Xu

Cal-DPO: Calibrated Direct Preference Optimization for Language Model Alignment

We study the problem of aligning large language models (LLMs) with human preference data. Contrastive preference optimization has shown promising results in aligning LLMs with available preference data by optimizing the implicit reward…

Machine Learning · Computer Science 2024-12-20 Teng Xiao , Yige Yuan , Huaisheng Zhu , Mingxiao Li , Vasant G Honavar

Reward-aware Preference Optimization: A Unified Mathematical Framework for Model Alignment

The rapid development of large language model (LLM) alignment algorithms has resulted in a complex and fragmented landscape, with limited clarity on the effectiveness of different methods and their inter-connections. This paper introduces…

Machine Learning · Computer Science 2025-02-11 Shengyang Sun , Yian Zhang , Alexander Bukharin , David Mosallanezhad , Jiaqi Zeng , Soumye Singhal , Gerald Shen , Adithya Renduchintala , Tugrul Konuk , Yi Dong , Zhilin Wang , Dmitry Chichkov , Olivier Delalleau , Oleksii Kuchaiev

Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM Game

Human preference alignment is essential to improve the interaction quality of large language models (LLMs). Existing alignment methods depend on manually annotated preference data to guide the LLM optimization directions. However,…

Computation and Language · Computer Science 2024-06-04 Pengyu Cheng , Yifan Yang , Jian Li , Yong Dai , Tianhao Hu , Peixin Cao , Nan Du , Xiaolong Li

Pareto-Optimal Learning from Preferences with Hidden Context

Ensuring AI models align with human values is essential for their safety and functionality. Reinforcement learning from human feedback (RLHF) leverages human preferences to achieve this alignment. However, when preferences are sourced from…

Machine Learning · Computer Science 2025-02-10 Ryan Bahlous-Boldi , Li Ding , Lee Spector , Scott Niekum

SPO: Multi-Dimensional Preference Sequential Alignment With Implicit Reward Modeling

Human preference alignment is critical in building powerful and reliable large language models (LLMs). However, current methods either ignore the multi-dimensionality of human preferences (e.g. helpfulness and harmlessness) or struggle with…

Machine Learning · Computer Science 2024-10-14 Xingzhou Lou , Junge Zhang , Jian Xie , Lifeng Liu , Dong Yan , Kaiqi Huang

Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models

Reinforcement Learning from Human Feedback (RLHF) has emerged as a powerful technique for aligning large language models (LLMs) with human preferences. However, effectively aligning LLMs with diverse human preferences remains a significant…

Computation and Language · Computer Science 2025-07-03 Chengao Li , Hanyu Zhang , Yunkun Xu , Hongyan Xue , Xiang Ao , Qing He

Beyond Pairwise: Empowering LLM Alignment With Ranked Choice Modeling

Alignment of large language models (LLMs) has predominantly relied on pairwise preference optimization, where annotators select the better of two responses to a prompt. While simple, this approach overlooks the opportunity to learn from…

Machine Learning · Computer Science 2026-02-11 Yuxuan Tang , Yifan Feng

Self-Improvement Towards Pareto Optimality: Mitigating Preference Conflicts in Multi-Objective Alignment

Multi-Objective Alignment (MOA) aims to align LLMs' responses with multiple human preference objectives, with Direct Preference Optimization (DPO) emerging as a prominent approach. However, we find that DPO-based MOA approaches suffer from…

Machine Learning · Computer Science 2025-12-09 Moxin Li , Yuantao Zhang , Wenjie Wang , Wentao Shi , Zhuo Liu , Fuli Feng , Tat-Seng Chua

Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis

Neural metrics for machine translation (MT) evaluation have become increasingly prominent due to their superior correlation with human judgments compared to traditional lexical metrics. Researchers have therefore utilized neural metrics…

Computation and Language · Computer Science 2025-11-21 Hippolyte Gisserot-Boukhlef , Ricardo Rei , Emmanuel Malherbe , Céline Hudelot , Pierre Colombo , Nuno M. Guerreiro

Parameter-Efficient Tuning Helps Language Model Alignment

Aligning large language models (LLMs) with human preferences is essential for safe and useful LLMs. Previous works mainly adopt reinforcement learning (RLHF) and direct preference optimization (DPO) with human feedback for alignment.…

Computation and Language · Computer Science 2023-10-03 Tianci Xue , Ziqi Wang , Heng Ji

CAPO: Confidence Aware Preference Optimization Learning for Multilingual Preferences

Preference optimization is a critical post-training technique used to align large language models (LLMs) with human preferences, typically by fine-tuning on ranked response pairs. While methods like Direct Preference Optimization (DPO) have…

Computation and Language · Computer Science 2025-11-12 Rhitabrat Pokharel , Yufei Tao , Ameeta Agrawal

Enhancing LLM Safety via Constrained Direct Preference Optimization

The rapidly increasing capabilities of large language models (LLMs) raise an urgent need to align AI systems with diverse human preferences to simultaneously enhance their usefulness and safety, despite the often conflicting nature of these…

Machine Learning · Computer Science 2024-03-06 Zixuan Liu , Xiaolin Sun , Zizhan Zheng

Towards Improved Preference Optimization Pipeline: from Data Generation to Budget-Controlled Regularization

Direct Preference Optimization (DPO) and its variants have become the de facto standards for aligning large language models (LLMs) with human preferences or specific goals. However, DPO requires high-quality preference data and suffers from…

Machine Learning · Computer Science 2024-11-12 Zhuotong Chen , Fang Liu , Jennifer Zhu , Wanyu Du , Yanjun Qi

Conformal Constrained Policy Optimization for Cost-Effective LLM Agents

While large language models (LLMs) have recently made tremendous progress towards solving challenging AI problems, they have done so at increasingly steep computational and API costs. We propose a novel strategy where we combine multiple…

Machine Learning · Computer Science 2026-03-24 Wenwen Si , Sooyong Jang , Insup Lee , Osbert Bastani

Learning to Align Human Code Preferences

Large Language Models (LLMs) have demonstrated remarkable potential in automating software development tasks. While recent advances leverage Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) to align models with human…

Software Engineering · Computer Science 2025-12-09 Xin Yin , Chao Ni , Xiaohu Yang

Triple Preference Optimization: Achieving Better Alignment using a Single Step Optimization

Reinforcement Learning with Human Feedback (RLHF) enhances the alignment of Large Language Models (LLMs). However, its limitations have led to the development of Direct Preference Optimization (DPO), an RL-free approach designed to overcome…

Computation and Language · Computer Science 2025-02-19 Amir Saeidi , Shivanshu Verma , Aswin RRV , Kashif Rasul , Chitta Baral

Mixed Preference Optimization: Reinforcement Learning with Data Selection and Better Reference Model

Large Language Models (LLMs) have become increasingly popular due to their ability to process and generate natural language. However, as they are trained on massive datasets of text, LLMs can inherit harmful biases and produce outputs that…

Computation and Language · Computer Science 2025-01-23 Qi Gou , Cam-Tu Nguyen