Related papers: Preference Optimization for Molecular Language Mod…

Investigating on RLHF methodology

In this article, we investigate the alignment of Large Language Models according to human preferences. We discuss the features of training a Preference Model, which simulates human preferences, and the methods and details we found essential…

Machine Learning · Computer Science 2024-10-03 Alexey Kutalev , Sergei Markoff

Multi-Reference Preference Optimization for Large Language Models

How can Large Language Models (LLMs) be aligned with human intentions and values? A typical solution is to gather human preference on model outputs and finetune the LLMs accordingly while ensuring that updates do not deviate too far from a…

Computation and Language · Computer Science 2024-05-28 Hung Le , Quan Tran , Dung Nguyen , Kien Do , Saloni Mittal , Kelechi Ogueji , Svetha Venkatesh

Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning

Effective training of language models (LMs) for mathematical reasoning tasks demands high-quality supervised fine-tuning data. Besides obtaining annotations from human experts, a common alternative is sampling from larger and more powerful…

Computation and Language · Computer Science 2024-07-26 Tianduo Wang , Shichen Li , Wei Lu

Optimizing Language Models for Human Preferences is a Causal Inference Problem

As large language models (LLMs) see greater use in academic and commercial settings, there is increasing interest in methods that allow language models to generate texts aligned with human preferences. In this paper, we present an initial…

Machine Learning · Computer Science 2024-06-07 Victoria Lin , Eli Ben-Michael , Louis-Philippe Morency

Preference optimization of protein language models as a multi-objective binder design paradigm

We present a multi-objective binder design paradigm based on instruction fine-tuning and direct preference optimization (DPO) of autoregressive protein language models (pLMs). Multiple design objectives are encoded in the language model…

Biological Physics · Physics 2024-03-08 Pouria Mistani , Venkatesh Mysore

Optimizing LLMs with Direct Preferences: A Data Efficiency Perspective

Aligning the output of Large Language Models (LLMs) with human preferences (e.g., by means of reinforcement learning with human feedback, or RLHF) is essential for ensuring their effectiveness in real-world scenarios. Despite significant…

Artificial Intelligence · Computer Science 2024-10-23 Pietro Bernardelle , Gianluca Demartini

Unified Preference Optimization: Language Model Alignment Beyond the Preference Frontier

For aligning large language models (LLMs), prior work has leveraged reinforcement learning via human feedback (RLHF) or variations of direct preference optimization (DPO). While DPO offers a simpler framework based on maximum likelihood…

Artificial Intelligence · Computer Science 2025-05-27 Anirudhan Badrinath , Prabhat Agarwal , Jiajing Xu

SGDPO: Self-Guided Direct Preference Optimization for Language Model Alignment

Direct Preference Optimization (DPO) is broadly utilized for aligning Large Language Models (LLMs) with human values because of its flexibility. Despite its effectiveness, it has been observed that the capability of DPO to generate…

Machine Learning · Computer Science 2025-05-20 Wenqiao Zhu , Ji Liu , Lulu Wang , Jun Wu , Yulun Zhang

Towards Improved Preference Optimization Pipeline: from Data Generation to Budget-Controlled Regularization

Direct Preference Optimization (DPO) and its variants have become the de facto standards for aligning large language models (LLMs) with human preferences or specific goals. However, DPO requires high-quality preference data and suffers from…

Machine Learning · Computer Science 2024-11-12 Zhuotong Chen , Fang Liu , Jennifer Zhu , Wanyu Du , Yanjun Qi

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

While large-scale unsupervised language models (LMs) learn broad world knowledge and some reasoning skills, achieving precise control of their behavior is difficult due to the completely unsupervised nature of their training. Existing…

Machine Learning · Computer Science 2024-07-31 Rafael Rafailov , Archit Sharma , Eric Mitchell , Stefano Ermon , Christopher D. Manning , Chelsea Finn

Would I Lie To You? Inference Time Alignment of Language Models using Direct Preference Heads

Pre-trained Language Models (LMs) exhibit strong zero-shot and in-context learning capabilities; however, their behaviors are often difficult to control. By utilizing Reinforcement Learning from Human Feedback (RLHF), it is possible to…

Computation and Language · Computer Science 2024-05-31 Avelina Asada Hadji-Kyriacou , Ognjen Arandjelovic

Preference Packing: Efficient Preference Optimization for Large Language Models

Resource-efficient training optimization techniques are becoming increasingly important as the size of large language models (LLMs) continues to grow. In particular, batch packing is commonly used in pre-training and supervised fine-tuning…

Computation and Language · Computer Science 2026-03-02 Jaekyung Cho

ULMA: Unified Language Model Alignment with Human Demonstration and Point-wise Preference

Aligning language models to human expectations, e.g., being helpful and harmless, has become a pressing challenge for large language models. A typical alignment procedure consists of supervised fine-tuning and preference learning. Most…

Machine Learning · Computer Science 2024-02-27 Tianchi Cai , Xierui Song , Jiyan Jiang , Fei Teng , Jinjie Gu , Guannan Zhang

Clear Preferences Leave Traces: Reference Model-Guided Sampling for Preference Learning

Direct Preference Optimization (DPO) has emerged as a de-facto approach for aligning language models with human preferences. Recent work has shown DPO's effectiveness relies on training data quality. In particular, clear quality differences…

Machine Learning · Computer Science 2025-01-28 Nirav Diwan , Tolga Ergen , Dongsub Shim , Honglak Lee

Data Selection for LLM Alignment Using Fine-Grained Preferences

Large language models (LLMs) alignment aims to ensure that the behavior of LLMs meets human preferences. While collecting data from multiple fine-grained, aspect-specific preferences becomes more and more feasible, existing alignment…

Machine Learning · Computer Science 2026-03-03 Jia Zhang , Yao Liu , Chen-Xi Zhang , Yi Liu , Yi-Xuan Jin , Lan-Zhe Guo , Yu-Feng Li

Building Math Agents with Multi-Turn Iterative Preference Learning

Recent studies have shown that large language models' (LLMs) mathematical problem-solving capabilities can be enhanced by integrating external tools, such as code interpreters, and employing multi-turn Chain-of-Thought (CoT) reasoning.…

Machine Learning · Computer Science 2025-03-03 Wei Xiong , Chengshuai Shi , Jiaming Shen , Aviv Rosenberg , Zhen Qin , Daniele Calandriello , Misha Khalman , Rishabh Joshi , Bilal Piot , Mohammad Saleh , Chi Jin , Tong Zhang , Tianqi Liu

Preference Alignment Improves Language Model-Based TTS

Recent advancements in text-to-speech (TTS) have shown that language model (LM)-based systems offer competitive performance to their counterparts. Further optimization can be achieved through preference alignment algorithms, which adjust…

Computation and Language · Computer Science 2024-09-20 Jinchuan Tian , Chunlei Zhang , Jiatong Shi , Hao Zhang , Jianwei Yu , Shinji Watanabe , Dong Yu

Towards Efficient Exact Optimization of Language Model Alignment

The alignment of language models with human preferences is vital for their application in real-world tasks. The problem is formulated as optimizing the model's policy to maximize the expected reward that reflects human preferences with…

Computation and Language · Computer Science 2024-06-06 Haozhe Ji , Cheng Lu , Yilin Niu , Pei Ke , Hongning Wang , Jun Zhu , Jie Tang , Minlie Huang

A Survey of Direct Preference Optimization

Large Language Models (LLMs) have demonstrated unprecedented generative capabilities, yet their alignment with human values remains critical for ensuring helpful and harmless deployments. While Reinforcement Learning from Human Feedback…

Machine Learning · Computer Science 2025-03-18 Shunyu Liu , Wenkai Fang , Zetian Hu , Junjie Zhang , Yang Zhou , Kongcheng Zhang , Rongcheng Tu , Ting-En Lin , Fei Huang , Mingli Song , Yongbin Li , Dacheng Tao

Active Preference Learning for Large Language Models

As large language models (LLMs) become more capable, fine-tuning techniques for aligning with human intent are increasingly important. A key consideration for aligning these models is how to most effectively use human resources, or model…

Machine Learning · Computer Science 2024-07-01 William Muldrew , Peter Hayes , Mingtian Zhang , David Barber