Related papers: ROPO: Robust Preference Optimization for Large Lan…

Reverse Preference Optimization for Complex Instruction Following

Instruction following (IF) is a critical capability for large language models (LLMs). However, handling complex instructions with multiple constraints remains challenging. Previous methods typically select preference pairs based on the…

Computation and Language · Computer Science 2025-05-29 Xiang Huang , Ting-En Lin , Feiteng Fang , Yuchuan Wu , Hangyu Li , Yuzhong Qu , Fei Huang , Yongbin Li

Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization

This study addresses the challenge of noise in training datasets for Direct Preference Optimization (DPO), a method for aligning Large Language Models (LLMs) with human preferences. We categorize noise into pointwise noise, which includes…

Machine Learning · Computer Science 2025-04-21 Junkang Wu , Yuexiang Xie , Zhengyi Yang , Jiancan Wu , Jiawei Chen , Jinyang Gao , Bolin Ding , Xiang Wang , Xiangnan He

Lightweight Robust Direct Preference Optimization

Direct Preference Optimization (DPO) has become a popular method for fine-tuning large language models (LLMs) due to its stability and simplicity. However, it is also known to be sensitive to noise in the data and prone to overfitting.…

Machine Learning · Computer Science 2025-10-28 Cheol Woo Kim , Shresth Verma , Mauricio Tec , Milind Tambe

RE-PO: Robust Enhanced Policy Optimization as a General Framework for LLM Alignment

Standard human preference-based alignment methods, such as Reinforcement Learning from Human Feedback (RLHF), are a cornerstone for aligning large language models (LLMs) with human values. However, these methods typically assume that…

Artificial Intelligence · Computer Science 2026-03-02 Xiaoyang Cao , Zelai Xu , Mo Guang , Kaiwen Long , Michiel A. Bakker , Yu Wang , Chao Yu

Reward-aware Preference Optimization: A Unified Mathematical Framework for Model Alignment

The rapid development of large language model (LLM) alignment algorithms has resulted in a complex and fragmented landscape, with limited clarity on the effectiveness of different methods and their inter-connections. This paper introduces…

Machine Learning · Computer Science 2025-02-11 Shengyang Sun , Yian Zhang , Alexander Bukharin , David Mosallanezhad , Jiaqi Zeng , Soumye Singhal , Gerald Shen , Adithya Renduchintala , Tugrul Konuk , Yi Dong , Zhilin Wang , Dmitry Chichkov , Olivier Delalleau , Oleksii Kuchaiev

One Goal, Many Challenges: Robust Preference Optimization Amid Content-Aware and Multi-Source Noise

Large Language Models (LLMs) have made significant strides in generating human-like responses, largely due to preference alignment techniques. However, these methods often assume unbiased human feedback, which is rarely the case in…

Machine Learning · Computer Science 2025-09-16 Amirabbas Afzali , Amirhossein Afsharrad , Seyed Shahabeddin Mousavi , Sanjay Lall

Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts

In the field of large language models (LLMs), aligning models with the diverse preferences of users is a critical challenge. Direct Preference Optimization (DPO) has played a key role in this area. It works by using pairs of preferences…

Computation and Language · Computer Science 2024-05-29 Yueqin Yin , Zhendong Wang , Yi Gu , Hai Huang , Weizhu Chen , Mingyuan Zhou

CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation

Large language models (LLMs) have shown great potential in natural language processing tasks, but their application to machine translation (MT) remains challenging due to pretraining on English-centric data and the complexity of…

Computation and Language · Computer Science 2025-01-24 Guofeng Cui , Pichao Wang , Yang Liu , Zemian Ke , Zhu Liu , Vimal Bhat

On Symmetric Losses for Robust Policy Optimization with Noisy Preferences

Optimizing policies based on human preferences is key to aligning language models with human intent. This work focuses on reward modeling, a core component in reinforcement learning from human feedback (RLHF), and offline preference…

Machine Learning · Computer Science 2025-06-02 Soichiro Nishimori , Yu-Jie Zhang , Thanawat Lodkaew , Masashi Sugiyama

ComPO: Preference Alignment via Comparison Oracles

Direct alignment methods are increasingly used for aligning large language models (LLMs) with human preferences. However, these methods suffer from the issues of verbosity and likelihood displacement, which can be driven by the noisy…

Computation and Language · Computer Science 2025-10-28 Peter Chen , Xi Chen , Wotao Yin , Tianyi Lin

RPO: Retrieval Preference Optimization for Robust Retrieval-Augmented Generation

While Retrieval-Augmented Generation (RAG) has exhibited promise in utilizing external knowledge, its generation process heavily depends on the quality and accuracy of the retrieved context. Large language models (LLMs) struggle to evaluate…

Computation and Language · Computer Science 2025-10-13 Shi-Qi Yan , Quan Liu , Zhen-Hua Ling

Weights-Rotated Preference Optimization for Large Language Models

Despite the efficacy of Direct Preference Optimization (DPO) in aligning Large Language Models (LLMs), reward hacking remains a pivotal challenge. This issue emerges when LLMs excessively reduce the probability of rejected completions to…

Computation and Language · Computer Science 2025-08-26 Chenxu Yang , Ruipeng Jia , Mingyu Zheng , Naibin Gu , Zheng Lin , Siyuan Chen , Weichong Yin , Hua Wu , Weiping Wang

Inducing Robustness in a 2 Dimensional Direct Preference Optimization Paradigm

Direct Preference Optimisation (DPO) has emerged as a powerful method for aligning Large Language Models (LLMs) with human preferences, offering a stable and efficient alternative to approaches that use Reinforcement learning via Human…

Artificial Intelligence · Computer Science 2025-05-06 Sarvesh Shashidhar , Ritik , Nachiketa Patil , Suraj Racha , Ganesh Ramakrishnan

RePO: Understanding Preference Learning Through ReLU-Based Optimization

Aligning large language models (LLMs) with human preferences is critical for real-world deployment, yet existing methods like RLHF face computational and stability challenges. While DPO establishes an offline paradigm with single…

Machine Learning · Computer Science 2025-10-28 Junkang Wu , Kexin Huang , Xue Wang , Jinyang Gao , Bolin Ding , Jiancan Wu , Xiangnan He , Xiang Wang

Provably Robust DPO: Aligning Language Models with Noisy Feedback

Learning from preference-based feedback has recently gained traction as a promising approach to align language models with human interests. While these aligned generative models have demonstrated impressive capabilities across various…

Machine Learning · Computer Science 2024-04-15 Sayak Ray Chowdhury , Anush Kini , Nagarajan Natarajan

Self-Evolutionary Large Language Models through Uncertainty-Enhanced Preference Optimization

Iterative preference optimization has recently become one of the de-facto training paradigms for large language models (LLMs), but the performance is still underwhelming due to too much noisy preference data yielded in the loop. To combat…

Computation and Language · Computer Science 2024-09-18 Jianing Wang , Yang Zhou , Xiaocheng Zhang , Mengjiao Bao , Peng Yan

Less is More: Improving LLM Alignment via Preference Data Selection

Direct Preference Optimization (DPO) has emerged as a promising approach for aligning large language models with human preferences. While prior work mainly extends DPO from the aspect of the objective function, we instead improve DPO from…

Machine Learning · Computer Science 2026-02-17 Xun Deng , Han Zhong , Rui Ai , Fuli Feng , Zheng Wang , Xiangnan He

How Well Can Preference Optimization Generalize Under Noisy Feedback?

As large language models (LLMs) advance their capabilities, aligning these models with human preferences has become crucial. Preference optimization, which trains models to distinguish between preferred and non-preferred responses based on…

Machine Learning · Computer Science 2026-02-02 Shawn Im , Sharon Li

Mixed Preference Optimization: Reinforcement Learning with Data Selection and Better Reference Model

Large Language Models (LLMs) have become increasingly popular due to their ability to process and generate natural language. However, as they are trained on massive datasets of text, LLMs can inherit harmful biases and produce outputs that…

Computation and Language · Computer Science 2025-01-23 Qi Gou , Cam-Tu Nguyen

Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization

Preference optimization for diffusion models aims to align them with human preferences for images. Previous methods typically use Vision-Language Models (VLMs) as pixel-level reward models to approximate human preferences. However, when…

Computer Vision and Pattern Recognition · Computer Science 2025-10-03 Tao Zhang , Cheng Da , Kun Ding , Huan Yang , Kun Jin , Yan Li , Tingting Gao , Di Zhang , Shiming Xiang , Chunhong Pan