Related papers: Sequence-level Large Language Model Training with …

Aligning Visual Contrastive learning models via Preference Optimization

Contrastive learning models have demonstrated impressive abilities to capture semantic similarities by aligning representations in the embedding space. However, their performance can be limited by the quality of the training data and its…

Computer Vision and Pattern Recognition · Computer Science 2025-03-27 Amirabbas Afzali , Borna Khodabandeh , Ali Rasekh , Mahyar JafariNodeh , Sepehr kazemi , Simon Gottschalk

Selective Preference Optimization via Token-Level Reward Function Estimation

Recent advancements in large language model alignment leverage token-level supervisions to perform fine-grained preference optimization. However, existing token-level alignment methods either optimize on all available tokens, which can be…

Computation and Language · Computer Science 2025-11-07 Kailai Yang , Zhiwei Liu , Qianqian Xie , Jimin Huang , Erxue Min , Sophia Ananiadou

Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness

Recently, there has been significant interest in replacing the reward model in Reinforcement Learning with Human Feedback (RLHF) methods for Large Language Models (LLMs), such as Direct Preference Optimization (DPO) and its variants. These…

Computation and Language · Computer Science 2024-09-27 Jian Li , Haojing Huang , Yujia Zhang , Pengfei Xu , Xi Chen , Rui Song , Lida Shi , Jingwen Wang , Hao Xu

CAPO: Confidence Aware Preference Optimization Learning for Multilingual Preferences

Preference optimization is a critical post-training technique used to align large language models (LLMs) with human preferences, typically by fine-tuning on ranked response pairs. While methods like Direct Preference Optimization (DPO) have…

Computation and Language · Computer Science 2025-11-12 Rhitabrat Pokharel , Yufei Tao , Ameeta Agrawal

Optimizing Language Models for Human Preferences is a Causal Inference Problem

As large language models (LLMs) see greater use in academic and commercial settings, there is increasing interest in methods that allow language models to generate texts aligned with human preferences. In this paper, we present an initial…

Machine Learning · Computer Science 2024-06-07 Victoria Lin , Eli Ben-Michael , Louis-Philippe Morency

CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation

Large language models (LLMs) have shown great potential in natural language processing tasks, but their application to machine translation (MT) remains challenging due to pretraining on English-centric data and the complexity of…

Computation and Language · Computer Science 2025-01-24 Guofeng Cui , Pichao Wang , Yang Liu , Zemian Ke , Zhu Liu , Vimal Bhat

SPO: Multi-Dimensional Preference Sequential Alignment With Implicit Reward Modeling

Human preference alignment is critical in building powerful and reliable large language models (LLMs). However, current methods either ignore the multi-dimensionality of human preferences (e.g. helpfulness and harmlessness) or struggle with…

Machine Learning · Computer Science 2024-10-14 Xingzhou Lou , Junge Zhang , Jian Xie , Lifeng Liu , Dong Yan , Kaiqi Huang

Not All Preferences are What You Need for Post-Training: Selective Alignment Strategy for Preference Optimization

Post-training alignment of large language models (LLMs) is a critical challenge, as not all tokens contribute equally to model performance. This paper introduces a selective alignment strategy that prioritizes high-impact tokens within…

Computation and Language · Computer Science 2025-07-11 Zhijin Dong

Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts

In the field of large language models (LLMs), aligning models with the diverse preferences of users is a critical challenge. Direct Preference Optimization (DPO) has played a key role in this area. It works by using pairs of preferences…

Computation and Language · Computer Science 2024-05-29 Yueqin Yin , Zhendong Wang , Yi Gu , Hai Huang , Weizhu Chen , Mingyuan Zhou

Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment

Large Language Models (LLMs) are often aligned using contrastive alignment objectives and preference pair datasets. The interaction between model, paired data, and objective makes alignment a complicated procedure, sometimes producing…

Machine Learning · Computer Science 2024-09-17 Karel D'Oosterlinck , Winnie Xu , Chris Develder , Thomas Demeester , Amanpreet Singh , Christopher Potts , Douwe Kiela , Shikib Mehri

Clear Preferences Leave Traces: Reference Model-Guided Sampling for Preference Learning

Direct Preference Optimization (DPO) has emerged as a de-facto approach for aligning language models with human preferences. Recent work has shown DPO's effectiveness relies on training data quality. In particular, clear quality differences…

Machine Learning · Computer Science 2025-01-28 Nirav Diwan , Tolga Ergen , Dongsub Shim , Honglak Lee

SparsePO: Controlling Preference Alignment of LLMs via Sparse Token Masks

Direct alignment algorithms have proven an effective step for aligning language models to human-desired behaviors. Current variants of the Direct Preference Optimization objective have focused on a strict setting where all tokens are…

Computation and Language · Computer Science 2025-11-03 Fenia Christopoulou , Ronald Cardenas , Gerasimos Lampouras , Haitham Bou-Ammar , Jun Wang

Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis

Neural metrics for machine translation (MT) evaluation have become increasingly prominent due to their superior correlation with human judgments compared to traditional lexical metrics. Researchers have therefore utilized neural metrics…

Computation and Language · Computer Science 2025-11-21 Hippolyte Gisserot-Boukhlef , Ricardo Rei , Emmanuel Malherbe , Céline Hudelot , Pierre Colombo , Nuno M. Guerreiro

Comparing Bad Apples to Good Oranges: Aligning Large Language Models via Joint Preference Optimization

A common technique for aligning large language models (LLMs) relies on acquiring human preferences by comparing multiple generations conditioned on a fixed context. This method, however, relies solely on pairwise comparisons, where the…

Computation and Language · Computer Science 2025-01-09 Hritik Bansal , Ashima Suvarna , Gantavya Bhatt , Nanyun Peng , Kai-Wei Chang , Aditya Grover

Constrain Alignment with Sparse Autoencoders

The alignment of large language models (LLMs) with human preferences remains a key challenge. While post-training techniques like Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO) have achieved…

Artificial Intelligence · Computer Science 2025-07-11 Qingyu Yin , Chak Tou Leong , Minjun Zhu , Hanqi Yan , Qiang Zhang , Yulan He , Wenjie Li , Jun Wang , Yue Zhang , Linyi Yang

TAB-PO: Preference Optimization with a Token-Level Adaptive Barrier for Token-Critical Structured Generation

Direct Preference Optimization is an offline post-SFT method for aligning language models from preference pairs, with strong results in instruction following and summarization. However, DPO's sequence-level implicit reward can be brittle…

Computation and Language · Computer Science 2026-03-03 Samah Fodeh , Linhai Ma , Ganesh Puthiaraju , Srivani Talakokkul , Afshan Khan , Ashley Hagaman , Sarah R. Lowe , Aimee Kendall Roundtree

ROPO: Robust Preference Optimization for Large Language Models

Preference alignment is pivotal for empowering large language models (LLMs) to generate helpful and harmless responses. However, the performance of preference alignment is highly sensitive to the prevalent noise in the preference data.…

Machine Learning · Computer Science 2024-05-29 Xize Liang , Chao Chen , Shuang Qiu , Jie Wang , Yue Wu , Zhihang Fu , Zhihao Shi , Feng Wu , Jieping Ye

Reverse Preference Optimization for Complex Instruction Following

Instruction following (IF) is a critical capability for large language models (LLMs). However, handling complex instructions with multiple constraints remains challenging. Previous methods typically select preference pairs based on the…

Computation and Language · Computer Science 2025-05-29 Xiang Huang , Ting-En Lin , Feiteng Fang , Yuchuan Wu , Hangyu Li , Yuzhong Qu , Fei Huang , Yongbin Li

Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning

State-of-the-art natural language understanding classification models follow two-stages: pre-training a large language model on an auxiliary task, and then fine-tuning the model on a task-specific labeled dataset using cross-entropy loss.…

Computation and Language · Computer Science 2021-04-06 Beliz Gunel , Jingfei Du , Alexis Conneau , Ves Stoyanov

CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations

Prior work establishes that controlled contrastiveness between self-generated responses from large language models, set via reward scores, improves downstream preference tuning in English. We extend this method to multiple languages and…

Computation and Language · Computer Science 2026-05-27 Mike Zhang , Ali Basirat , Desmond Elliott