Related papers: Spread Preference Annotation: Direct Preference Ju…

Alignment Data Map for Efficient Preference Data Selection and Diagnosis

Human preference data is essential for aligning large language models (LLMs) with human values, but collecting such data is often costly and inefficient-motivating the need for efficient data selection methods that reduce annotation costs…

Computation and Language · Computer Science 2026-04-21 Seohyeong Lee , Eunwon Kim , Hwaran Lee , Buru Chang

Reward-Augmented Data Enhances Direct Preference Alignment of LLMs

Preference alignment in Large Language Models (LLMs) has significantly improved their ability to adhere to human instructions and intentions. However, existing direct alignment algorithms primarily focus on relative preferences and often…

Machine Learning · Computer Science 2025-05-13 Shenao Zhang , Zhihan Liu , Boyi Liu , Yufeng Zhang , Yingxiang Yang , Yongfei Liu , Liyu Chen , Tao Sun , Zhaoran Wang

Larger or Smaller Reward Margins to Select Preferences for Alignment?

Preference learning is critical for aligning large language models (LLMs) with human values, with the quality of preference datasets playing a crucial role in this process. While existing metrics primarily assess data quality based on…

Machine Learning · Computer Science 2025-03-05 Kexin Huang , Junkang Wu , Ziqian Chen , Xue Wang , Jinyang Gao , Bolin Ding , Jiancan Wu , Xiangnan He , Xiang Wang

Difficulty-Based Preference Data Selection by DPO Implicit Reward Gap

Aligning large language models (LLMs) with human preferences is a critical challenge in AI research. While methods like Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO) are widely used, they often…

Computation and Language · Computer Science 2026-05-19 Xuan Qi , Rongwu Xu , Zhijing Jin

$i$REPO: $i$mplicit Reward Pairwise Difference based Empirical Preference Optimization

While astonishingly capable, large Language Models (LLM) can sometimes produce outputs that deviate from human expectations. Such deviations necessitate an alignment phase to prevent disseminating untruthful, toxic, or biased information.…

Artificial Intelligence · Computer Science 2024-10-30 Long Tan Le , Han Shu , Tung-Anh Nguyen , Choong Seon Hong , Nguyen H. Tran

Less is More: Improving LLM Alignment via Preference Data Selection

Direct Preference Optimization (DPO) has emerged as a promising approach for aligning large language models with human preferences. While prior work mainly extends DPO from the aspect of the objective function, we instead improve DPO from…

Machine Learning · Computer Science 2026-02-17 Xun Deng , Han Zhong , Rui Ai , Fuli Feng , Zheng Wang , Xiangnan He

SPO: Multi-Dimensional Preference Sequential Alignment With Implicit Reward Modeling

Human preference alignment is critical in building powerful and reliable large language models (LLMs). However, current methods either ignore the multi-dimensionality of human preferences (e.g. helpfulness and harmlessness) or struggle with…

Machine Learning · Computer Science 2024-10-14 Xingzhou Lou , Junge Zhang , Jian Xie , Lifeng Liu , Dong Yan , Kaiqi Huang

Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness

Recently, there has been significant interest in replacing the reward model in Reinforcement Learning with Human Feedback (RLHF) methods for Large Language Models (LLMs), such as Direct Preference Optimization (DPO) and its variants. These…

Computation and Language · Computer Science 2024-09-27 Jian Li , Haojing Huang , Yujia Zhang , Pengfei Xu , Xi Chen , Rui Song , Lida Shi , Jingwen Wang , Hao Xu

Sample Efficient Preference Alignment in LLMs via Active Exploration

Preference-based feedback is important for many applications in machine learning where evaluation of a reward function is not feasible. Notable recent examples arise in preference alignment for large language models, including in…

Machine Learning · Computer Science 2025-03-21 Viraj Mehta , Syrine Belakaria , Vikramjeet Das , Ojash Neopane , Yijia Dai , Ilija Bogunovic , Barbara Engelhardt , Stefano Ermon , Jeff Schneider , Willie Neiswanger

REAL: Response Embedding-based Alignment for LLMs

Aligning large language models (LLMs) to human preferences is a crucial step in building helpful and safe AI tools, which usually involve training on supervised datasets. Popular algorithms such as Direct Preference Optimization (DPO) rely…

Computation and Language · Computer Science 2025-06-05 Honggen Zhang , Xufeng Zhao , Igor Molybog , June Zhang

SeRA: Self-Reviewing and Alignment of Large Language Models using Implicit Reward Margins

Direct alignment algorithms (DAAs), such as direct preference optimization (DPO), have become popular alternatives for Reinforcement Learning from Human Feedback (RLHF) due to their simplicity, efficiency, and stability. However, the…

Machine Learning · Computer Science 2024-10-15 Jongwoo Ko , Saket Dingliwal , Bhavana Ganesh , Sailik Sengupta , Sravan Bodapati , Aram Galstyan

ULMA: Unified Language Model Alignment with Human Demonstration and Point-wise Preference

Aligning language models to human expectations, e.g., being helpful and harmless, has become a pressing challenge for large language models. A typical alignment procedure consists of supervised fine-tuning and preference learning. Most…

Machine Learning · Computer Science 2024-02-27 Tianchi Cai , Xierui Song , Jiyan Jiang , Fei Teng , Jinjie Gu , Guannan Zhang

Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM Game

Human preference alignment is essential to improve the interaction quality of large language models (LLMs). Existing alignment methods depend on manually annotated preference data to guide the LLM optimization directions. However,…

Computation and Language · Computer Science 2024-06-04 Pengyu Cheng , Yifan Yang , Jian Li , Yong Dai , Tianhao Hu , Peixin Cao , Nan Du , Xiaolong Li

Preference VLM: Leveraging VLMs for Scalable Preference-Based Reinforcement Learning

Preference-based reinforcement learning (RL) offers a promising approach for aligning policies with human intent but is often constrained by the high cost of human feedback. In this work, we introduce PrefVLM, a framework that integrates…

Machine Learning · Computer Science 2025-02-04 Udita Ghosh , Dripta S. Raychaudhuri , Jiachen Li , Konstantinos Karydis , Amit Roy-Chowdhury

PRECISE: Reducing the Bias of LLM Evaluations Using Prediction-Powered Ranking Estimation

Evaluating the quality of search, ranking and RAG systems traditionally requires a significant number of human relevance annotations. In recent times, several deployed systems have explored the usage of Large Language Models (LLMs) as…

Machine Learning · Computer Science 2026-01-27 Abhishek Divekar , Anirban Majumder

Aligning Crowd Feedback via Distributional Preference Reward Modeling

Deep Reinforcement Learning is widely used for aligning Large Language Models (LLM) with human preference. However, the conventional reward modelling is predominantly dependent on human annotations provided by a select cohort of…

Artificial Intelligence · Computer Science 2024-05-31 Dexun Li , Cong Zhang , Kuicai Dong , Derrick Goh Xin Deik , Ruiming Tang , Yong Liu

Learning LLM Preference over Intra-Dialogue Pairs: A Framework for Utterance-level Understandings

Large language models (LLMs) have demonstrated remarkable capabilities in handling complex dialogue tasks without requiring use case-specific fine-tuning. However, analyzing live dialogues in real-time necessitates low-latency processing…

Computation and Language · Computer Science 2025-03-10 Xuanqing Liu , Luyang Kong , Wei Niu , Afshin Khashei , Belinda Zeng , Steve Johnson , Jon Jay , Davor Golac , Matt Pope

On the Role of Preference Variance in Preference Optimization

Direct Preference Optimization (DPO) has emerged as an important approach for learning from human preferences in aligning large language models (LLMs). However, collecting human preference data is costly and inefficient, motivating methods…

Computation and Language · Computer Science 2025-12-01 Jiacheng Guo , Zihao Li , Jiahao Qiu , Yue Wu , Mengdi Wang

Data-Centric Human Preference with Rationales for Direct Preference Alignment

Aligning language models with human preferences through reinforcement learning from human feedback is crucial for their safe and effective deployment. The human preference is typically represented through comparison where one response is…

Machine Learning · Computer Science 2025-07-15 Hoang Anh Just , Ming Jin , Anit Sahu , Huy Phan , Ruoxi Jia

Implicit Cross-Lingual Rewarding for Efficient Multilingual Preference Alignment

Direct Preference Optimization (DPO) has become a prominent method for aligning Large Language Models (LLMs) with human preferences. While DPO has enabled significant progress in aligning English LLMs, multilingual preference alignment is…

Computation and Language · Computer Science 2025-06-06 Wen Yang , Junhong Wu , Chen Wang , Chengqing Zong , Jiajun Zhang