English
Related papers

Related papers: Making RL with Preference-based Feedback Efficient…

200 papers

Learning from human preferences is a cornerstone of aligning machine learning models with subjective human judgments. Yet, collecting such preference data is often costly and time-consuming, motivating the need for more efficient learning…

Machine Learning · Computer Science 2025-11-07 Matteo Cercola , Valeria Capretti , Simone Formentin

Reinforcement learning from human feedback (RLHF) has emerged as a key technique for aligning the output of large language models (LLMs) with human preferences. To learn the reward function, most existing RLHF algorithms use the…

Machine Learning · Statistics 2026-02-11 Kai Ye , Hongyi Zhou , Jin Zhu , Francesco Quinzan , Chengchun Shi

Reinforcement learning with human feedback (RLHF), which learns a reward model from human preference data and then optimizes a policy to favor preferred responses, has emerged as a central paradigm for aligning large language models (LLMs)…

Machine Learning · Statistics 2025-09-29 Gen Li , Yuling Yan

We study the problem of reinforcement learning from human feedback (RLHF), a critical problem in training large language models, from a theoretical perspective. Our main contribution is the design of novel sample-efficient RLHF algorithms…

Machine Learning · Computer Science 2025-08-11 Han Qi , Haochen Yang , Qiaosheng Zhang , Zhuoran Yang

Reinforcement Learning from Human Feedback (RLHF) has become the standard approach for aligning Large Language Models (LLMs) with human preferences, allowing LLMs to demonstrate remarkable abilities in various tasks. Existing methods work…

Reinforcement Learning from Human Feedback (RLHF) has become a popular approach to align language models (LMs) with human preferences. This method involves collecting a large dataset of human pairwise preferences across various text…

Machine Learning · Computer Science 2024-10-24 Antoine Scheid , Etienne Boursier , Alain Durmus , Michael I. Jordan , Pierre Ménard , Eric Moulines , Michal Valko

Bayesian optimization (BO) with preference-based feedback has recently garnered significant attention due to its emerging applications. We refer to this problem as Bayesian Optimization from Human Feedback (BOHF), which differs from…

Machine Learning · Computer Science 2025-05-30 Aya Kayal , Sattar Vakili , Laura Toni , Da-shan Shiu , Alberto Bernacchia

Reinforcement learning from human feedback (RLHF) has emerged as an effective approach to aligning large language models (LLMs) to human preferences. RLHF contains three steps, i.e., human preference collecting, reward learning, and policy…

Computation and Language · Computer Science 2024-03-29 Hao Lang , Fei Huang , Yongbin Li

Reinforcement learning from human feedback (RLHF) replaces hard-to-specify rewards with pairwise trajectory preferences, yet regret-oriented theory often assumes that preference labels are generated consistently from a single ground-truth…

Machine Learning · Computer Science 2026-04-03 Ming Shi , Yingbin Liang , Ness B. Shroff , Ananthram Swami

Reinforcement Learning from Human Feedback (\textbf{RLHF}) has emerged as a dominant approach for aligning LLM outputs with human preferences. Inspired by the success of RLHF, we study the performance of multiple algorithms that learn from…

Reinforcement learning from human feedback (RLHF) has emerged as a central framework for aligning large language models (LLMs) with human preferences. Despite its practical success, RLHF raises fundamental statistical questions because it…

Machine Learning · Statistics 2026-04-06 Pangpang Liu , Chengchun Shi , Will Wei Sun

Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular paradigm for aligning models with human intent. Typically RLHF algorithms operate in two phases: first, use human preferences to learn a reward function and second,…

Machine Learning · Computer Science 2024-05-01 Joey Hejna , Rafael Rafailov , Harshit Sikchi , Chelsea Finn , Scott Niekum , W. Bradley Knox , Dorsa Sadigh

Reinforcement Learning from Human Feedback (RLHF) is a powerful paradigm for aligning foundation models to human values and preferences. However, current RLHF techniques cannot account for the naturally occurring differences in individual…

Machine Learning · Computer Science 2024-08-20 Sriyash Poddar , Yanming Wan , Hamish Ivison , Abhishek Gupta , Natasha Jaques

Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique to make large language models (LLMs) easier to use and more effective. A core piece of the RLHF process is the training and utilization of a model of…

Computers and Society · Computer Science 2023-11-29 Nathan Lambert , Thomas Krendl Gilbert , Tom Zick

We study reinforcement learning from human feedback in general Markov decision processes, where agents learn from trajectory-level preference comparisons. A central challenge in this setting is to design algorithms that select informative…

Machine Learning · Computer Science 2025-12-05 Andreas Schlaginhaufen , Reda Ouhamma , Maryam Kamgarpour

The utility of reinforcement learning is limited by the alignment of reward functions with the interests of human stakeholders. One promising method for alignment is to learn the reward function from human-generated preferences between…

Machine Learning · Computer Science 2023-09-08 W. Bradley Knox , Stephane Hatgis-Kessell , Serena Booth , Scott Niekum , Peter Stone , Alessandro Allievi

Human feedback often arrives as preferences rather than calibrated numeric rewards, motivating reinforcement learning from preferential feedback, also referred to as reinforcement learning from human feedback (RLHF). We present a rigorous…

Machine Learning · Statistics 2026-05-26 Nikola Pavlovic , Sattar Vakili , Qing Zhao

Reinforcement learning from Human Feedback (RLHF) learns from preference signals, while standard Reinforcement Learning (RL) directly learns from reward signals. Preferences arguably contain less information than rewards, which makes…

Machine Learning · Computer Science 2023-11-07 Yuanhao Wang , Qinghua Liu , Chi Jin

Reinforcement Learning from Human Feedback (RLHF) is a widely used framework for the training of language models. However, the process of using RLHF to develop a language model that is well-aligned presents challenges, especially when it…

Computation and Language · Computer Science 2024-04-09 Bowen Qin , Duanyu Feng , Xi Yang

Reinforcement Learning from Human Feedback (RLHF) has emerged as a powerful approach for aligning generative models, but its reliance on learned reward models makes it vulnerable to mis-specification and reward hacking. Preference-based…

Machine Learning · Computer Science 2026-04-23 Akhil Agnihotri , Rahul Jain , Deepak Ramachandran , Zheng Wen
‹ Prev 1 2 3 10 Next ›