Related papers: The Preference Learning Toolbox

POLAR: Preference Optimization and Learning Algorithms for Robotics

Parameter tuning for robotic systems is a time-consuming and challenging task that often relies on domain expertise of the human operator. Moreover, existing learning methods are not well suited for parameter tuning for many reasons…

Robotics · Computer Science 2022-08-10 Maegan Tucker , Kejun Li , Yisong Yue , Aaron D. Ames

Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback

Learning from preference feedback has emerged as an essential step for improving the generation quality and performance of modern language models (LMs). Despite its widespread use, the way preference-based learning is applied varies wildly,…

Computation and Language · Computer Science 2024-10-10 Hamish Ivison , Yizhong Wang , Jiacheng Liu , Zeqiu Wu , Valentina Pyatkin , Nathan Lambert , Noah A. Smith , Yejin Choi , Hannaneh Hajishirzi

Inverse Preference Learning: Preference-based RL without a Reward Function

Reward functions are difficult to design and often hard to align with human intent. Preference-based Reinforcement Learning (RL) algorithms address these problems by learning reward functions from human feedback. However, the majority of…

Machine Learning · Computer Science 2023-11-28 Joey Hejna , Dorsa Sadigh

Multi-Response Preference Optimization with Augmented Ranking Dataset

Recent advancements in Large Language Models (LLMs) have been remarkable, with new models consistently surpassing their predecessors. These advancements are underpinned by extensive research on various training mechanisms. Among these,…

Computation and Language · Computer Science 2024-12-12 Hansle Gwon , Imjin Ahn , Young-Hak Kim , Sanghyun Park , Tae Joon Jun

What Is Preference Optimization Doing, and Why?

Preference optimization (PO) is indispensable for large language models (LLMs), with methods such as direct preference optimization (DPO) and proximal policy optimization (PPO) achieving great success. A common belief is that DPO is…

Machine Learning · Computer Science 2026-05-18 Yue Wang , Qizhou Wang , Zizhuo Zhang , Gang Niu , Bo Han , Masashi Sugiyama

Cross-Preference Learning for Sentence-Level and Context-Aware Machine Translation

Context-aware machine translation (MT) leverages document-level information, yet it does not consistently outperform sentence-level MT, as contextual signals are unevenly beneficial across sentences. Existing training objectives do not…

Computation and Language · Computer Science 2026-03-27 Ying Li , Xinglin Lyu , Junhui Li , Jinlong Yang , Hengchao Shang , Min Zhang , Shimin Tao , Daimeng Wei

Preference-based Multi-Objective Reinforcement Learning

Multi-objective reinforcement learning (MORL) is a structured approach for optimizing tasks with multiple objectives. However, it often relies on pre-defined reward functions, which can be hard to design for balancing conflicting goals and…

Machine Learning · Computer Science 2025-07-21 Ni Mu , Yao Luan , Qing-Shan Jia

PREDILECT: Preferences Delineated with Zero-Shot Language-based Reasoning in Reinforcement Learning

Preference-based reinforcement learning (RL) has emerged as a new field in robot learning, where humans play a pivotal role in shaping robot behavior by expressing preferences on different sequences of state-action pairs. However,…

Robotics · Computer Science 2024-02-26 Simon Holk , Daniel Marta , Iolanda Leite

Preference Curriculum: LLMs Should Always Be Pretrained on Their Preferred Data

Large language models (LLMs) generally utilize a consistent data distribution throughout the pretraining process. However, as the model's capability improves, it is intuitive that its data preferences dynamically change, indicating the need…

Computation and Language · Computer Science 2025-02-18 Xuemiao Zhang , Liangyu Xu , Feiyu Duan , Yongwei Zhou , Sirui Wang , Rongxiang Weng , Jingang Wang , Xunliang Cai

PLOT: Enhancing Preference Learning via Optimal Transport

Preference learning in Large Language Models (LLMs) has advanced significantly, yet existing methods remain limited by modest performance gains, high computational costs, hyperparameter sensitivity, and insufficient modeling of global…

Computation and Language · Computer Science 2026-04-03 Liang Zhu , Yuelin Bai , Xiankun Ren , Jiaxi Yang , Lei Zhang , Feiteng Fang , Hamid Alinejad-Rokny , Minghuan Tan , Min Yang

Multi-Type Preference Learning: Empowering Preference-Based Reinforcement Learning with Equal Preferences

Preference-Based reinforcement learning (PBRL) learns directly from the preferences of human teachers regarding agent behaviors without needing meticulously designed reward functions. However, existing PBRL methods often learn primarily…

Machine Learning · Computer Science 2024-10-16 Ziang Liu , Junjie Xu , Xingjiao Wu , Jing Yang , Liang He

Advances in Preference-based Reinforcement Learning: A Review

Reinforcement Learning (RL) algorithms suffer from the dependency on accurately engineered reward functions to properly guide the learning agents to do the required tasks. Preference-based reinforcement learning (PbRL) addresses that by…

Artificial Intelligence · Computer Science 2024-08-23 Youssef Abdelkareem , Shady Shehata , Fakhri Karray

Offline Preference-Based Apprenticeship Learning

Learning a reward function from human preferences is challenging as it typically requires having a high-fidelity simulator or using expensive and potentially unsafe actual physical rollouts in the environment. However, in many tasks the…

Machine Learning · Computer Science 2022-02-18 Daniel Shin , Daniel S. Brown , Anca D. Dragan

Predictive Preference Learning from Human Interventions

Learning from human involvement aims to incorporate the human subject to monitor and correct agent behavior errors. Although most interactive imitation learning methods focus on correcting the agent's action at the current state, they do…

Machine Learning · Computer Science 2025-10-17 Haoyuan Cai , Zhenghao Peng , Bolei Zhou

PD-MORL: Preference-Driven Multi-Objective Reinforcement Learning Algorithm

Multi-objective reinforcement learning (MORL) approaches have emerged to tackle many real-world problems with multiple conflicting objectives by maximizing a joint objective function weighted by a preference vector. These approaches find…

Machine Learning · Computer Science 2023-05-31 Toygun Basaklar , Suat Gumussoy , Umit Y. Ogras

Hindsight Preference Learning for Offline Preference-based Reinforcement Learning

Offline preference-based reinforcement learning (RL), which focuses on optimizing policies using human preferences between pairs of trajectory segments selected from an offline dataset, has emerged as a practical avenue for RL applications.…

Machine Learning · Computer Science 2024-07-08 Chen-Xiao Gao , Shengjun Fang , Chenjun Xiao , Yang Yu , Zongzhang Zhang

Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data

Learning from preference labels plays a crucial role in fine-tuning large language models. There are several distinct approaches for preference fine-tuning, including supervised learning, on-policy reinforcement learning (RL), and…

Machine Learning · Computer Science 2024-06-04 Fahim Tajwar , Anikait Singh , Archit Sharma , Rafael Rafailov , Jeff Schneider , Tengyang Xie , Stefano Ermon , Chelsea Finn , Aviral Kumar

Opportunities in Machine Learning for Particle Accelerators

Machine learning (ML) is a subfield of artificial intelligence. The term applies broadly to a collection of computational algorithms and techniques that train systems from raw data rather than a priori models. ML techniques are now…

Accelerator Physics · Physics 2018-11-09 Auralee Edelen , Christopher Mayes , Daniel Bowring , Daniel Ratner , Andreas Adelmann , Rasmus Ischebeck , Jochem Snuverink , Ilya Agapov , Raimund Kammering , Jonathan Edelen , Ivan Bazarov , Gianluca Valentino , Jorg Wenninger

ICPL: Few-shot In-context Preference Learning via LLMs

Preference-based reinforcement learning is an effective way to handle tasks where rewards are hard to specify but can be exceedingly inefficient as preference learning is often tabula rasa. We demonstrate that Large Language Models (LLMs)…

Artificial Intelligence · Computer Science 2025-04-04 Chao Yu , Qixin Tan , Hong Lu , Jiaxuan Gao , Xinting Yang , Yu Wang , Yi Wu , Eugene Vinitsky

Towards a Unified View of Preference Learning for Large Language Models: A Survey

Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial factors to achieve success is aligning the LLM's output with human preferences. This alignment process often requires only a small amount of data to…

Computation and Language · Computer Science 2024-11-01 Bofei Gao , Feifan Song , Yibo Miao , Zefan Cai , Zhe Yang , Liang Chen , Helan Hu , Runxin Xu , Qingxiu Dong , Ce Zheng , Shanghaoran Quan , Wen Xiao , Ge Zhang , Daoguang Zan , Keming Lu , Bowen Yu , Dayiheng Liu , Zeyu Cui , Jian Yang , Lei Sha , Houfeng Wang , Zhifang Sui , Peiyi Wang , Tianyu Liu , Baobao Chang