English
Related papers

Related papers: The Preference Learning Toolbox

200 papers

Parameter tuning for robotic systems is a time-consuming and challenging task that often relies on domain expertise of the human operator. Moreover, existing learning methods are not well suited for parameter tuning for many reasons…

Robotics · Computer Science 2022-08-10 Maegan Tucker , Kejun Li , Yisong Yue , Aaron D. Ames

Learning from preference feedback has emerged as an essential step for improving the generation quality and performance of modern language models (LMs). Despite its widespread use, the way preference-based learning is applied varies wildly,…

Computation and Language · Computer Science 2024-10-10 Hamish Ivison , Yizhong Wang , Jiacheng Liu , Zeqiu Wu , Valentina Pyatkin , Nathan Lambert , Noah A. Smith , Yejin Choi , Hannaneh Hajishirzi

Reward functions are difficult to design and often hard to align with human intent. Preference-based Reinforcement Learning (RL) algorithms address these problems by learning reward functions from human feedback. However, the majority of…

Machine Learning · Computer Science 2023-11-28 Joey Hejna , Dorsa Sadigh

Recent advancements in Large Language Models (LLMs) have been remarkable, with new models consistently surpassing their predecessors. These advancements are underpinned by extensive research on various training mechanisms. Among these,…

Computation and Language · Computer Science 2024-12-12 Hansle Gwon , Imjin Ahn , Young-Hak Kim , Sanghyun Park , Tae Joon Jun

Preference optimization (PO) is indispensable for large language models (LLMs), with methods such as direct preference optimization (DPO) and proximal policy optimization (PPO) achieving great success. A common belief is that DPO is…

Machine Learning · Computer Science 2026-05-18 Yue Wang , Qizhou Wang , Zizhuo Zhang , Gang Niu , Bo Han , Masashi Sugiyama

Context-aware machine translation (MT) leverages document-level information, yet it does not consistently outperform sentence-level MT, as contextual signals are unevenly beneficial across sentences. Existing training objectives do not…

Computation and Language · Computer Science 2026-03-27 Ying Li , Xinglin Lyu , Junhui Li , Jinlong Yang , Hengchao Shang , Min Zhang , Shimin Tao , Daimeng Wei

Multi-objective reinforcement learning (MORL) is a structured approach for optimizing tasks with multiple objectives. However, it often relies on pre-defined reward functions, which can be hard to design for balancing conflicting goals and…

Machine Learning · Computer Science 2025-07-21 Ni Mu , Yao Luan , Qing-Shan Jia

Preference-based reinforcement learning (RL) has emerged as a new field in robot learning, where humans play a pivotal role in shaping robot behavior by expressing preferences on different sequences of state-action pairs. However,…

Robotics · Computer Science 2024-02-26 Simon Holk , Daniel Marta , Iolanda Leite

Large language models (LLMs) generally utilize a consistent data distribution throughout the pretraining process. However, as the model's capability improves, it is intuitive that its data preferences dynamically change, indicating the need…

Computation and Language · Computer Science 2025-02-18 Xuemiao Zhang , Liangyu Xu , Feiyu Duan , Yongwei Zhou , Sirui Wang , Rongxiang Weng , Jingang Wang , Xunliang Cai

Preference learning in Large Language Models (LLMs) has advanced significantly, yet existing methods remain limited by modest performance gains, high computational costs, hyperparameter sensitivity, and insufficient modeling of global…

Computation and Language · Computer Science 2026-04-03 Liang Zhu , Yuelin Bai , Xiankun Ren , Jiaxi Yang , Lei Zhang , Feiteng Fang , Hamid Alinejad-Rokny , Minghuan Tan , Min Yang

Preference-Based reinforcement learning (PBRL) learns directly from the preferences of human teachers regarding agent behaviors without needing meticulously designed reward functions. However, existing PBRL methods often learn primarily…

Machine Learning · Computer Science 2024-10-16 Ziang Liu , Junjie Xu , Xingjiao Wu , Jing Yang , Liang He

Reinforcement Learning (RL) algorithms suffer from the dependency on accurately engineered reward functions to properly guide the learning agents to do the required tasks. Preference-based reinforcement learning (PbRL) addresses that by…

Artificial Intelligence · Computer Science 2024-08-23 Youssef Abdelkareem , Shady Shehata , Fakhri Karray

Learning a reward function from human preferences is challenging as it typically requires having a high-fidelity simulator or using expensive and potentially unsafe actual physical rollouts in the environment. However, in many tasks the…

Machine Learning · Computer Science 2022-02-18 Daniel Shin , Daniel S. Brown , Anca D. Dragan

Learning from human involvement aims to incorporate the human subject to monitor and correct agent behavior errors. Although most interactive imitation learning methods focus on correcting the agent's action at the current state, they do…

Machine Learning · Computer Science 2025-10-17 Haoyuan Cai , Zhenghao Peng , Bolei Zhou

Multi-objective reinforcement learning (MORL) approaches have emerged to tackle many real-world problems with multiple conflicting objectives by maximizing a joint objective function weighted by a preference vector. These approaches find…

Machine Learning · Computer Science 2023-05-31 Toygun Basaklar , Suat Gumussoy , Umit Y. Ogras

Offline preference-based reinforcement learning (RL), which focuses on optimizing policies using human preferences between pairs of trajectory segments selected from an offline dataset, has emerged as a practical avenue for RL applications.…

Machine Learning · Computer Science 2024-07-08 Chen-Xiao Gao , Shengjun Fang , Chenjun Xiao , Yang Yu , Zongzhang Zhang

Learning from preference labels plays a crucial role in fine-tuning large language models. There are several distinct approaches for preference fine-tuning, including supervised learning, on-policy reinforcement learning (RL), and…

Machine learning (ML) is a subfield of artificial intelligence. The term applies broadly to a collection of computational algorithms and techniques that train systems from raw data rather than a priori models. ML techniques are now…

Preference-based reinforcement learning is an effective way to handle tasks where rewards are hard to specify but can be exceedingly inefficient as preference learning is often tabula rasa. We demonstrate that Large Language Models (LLMs)…

Artificial Intelligence · Computer Science 2025-04-04 Chao Yu , Qixin Tan , Hong Lu , Jiaxuan Gao , Xinting Yang , Yu Wang , Yi Wu , Eugene Vinitsky

Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial factors to achieve success is aligning the LLM's output with human preferences. This alignment process often requires only a small amount of data to…

‹ Prev 1 2 3 10 Next ›