Related papers: Rating-based Reinforcement Learning

Reinforcement Learning from Multi-level and Episodic Human Feedback

Designing an effective reward function has long been a challenge in reinforcement learning, particularly for complex tasks in unstructured environments. To address this, various learning paradigms have emerged that leverage different forms…

Machine Learning · Computer Science 2025-04-29 Muhammad Qasim Elahi , Somtochukwu Oguchienti , Maheed H. Ahmed , Mahsa Ghasemi

Multi-Task Reward Learning from Human Ratings

Reinforcement learning from human feedback (RLHF) has become a key factor in aligning model behavior with users' goals. However, while humans integrate multiple strategies when making decisions, current RLHF approaches often simplify this…

Machine Learning · Computer Science 2025-06-19 Mingkang Wu , Devin White , Evelyn Rose , Vernon Lawhern , Nicholas R Waytowich , Yongcan Cao

Performance Optimization of Ratings-Based Reinforcement Learning

This paper explores multiple optimization methods to improve the performance of rating-based reinforcement learning (RbRL). RbRL, a method based on the idea of human ratings, has been developed to infer reward functions in reward-free…

Machine Learning · Computer Science 2025-01-15 Evelyn Rose , Devin White , Mingkang Wu , Vernon Lawhern , Nicholas R. Waytowich , Yongcan Cao

Prioritized Experience-based Reinforcement Learning with Human Guidance for Autonomous Driving

Reinforcement learning (RL) requires skillful definition and remarkable computational efforts to solve optimization and control problems, which could impair its prospect. Introducing human guidance into reinforcement learning is a promising…

Machine Learning · Computer Science 2022-11-30 Jingda Wu , Zhiyu Huang , Wenhui Huang , Chen Lv

Reward Learning through Ranking Mean Squared Error

Reward design remains a significant bottleneck in applying reinforcement learning (RL) to real-world problems. A popular alternative is reward learning, where reward functions are inferred from human feedback rather than manually specified.…

Machine Learning · Computer Science 2026-01-16 Chaitanya Kharyal , Calarina Muslimani , Matthew E. Taylor

Reinforcement Learning from Diverse Human Preferences

The complexity of designing reward functions has been a major obstacle to the wide application of deep reinforcement learning (RL) techniques. Describing an agent's desired behaviors and properties can be difficult, even for experts. A new…

Machine Learning · Computer Science 2024-05-09 Wanqi Xue , Bo An , Shuicheng Yan , Zhongwen Xu

Human-Inspired Multi-Level Reinforcement Learning

Reinforcement learning (RL), a common tool in decision making, learns control policies from various experiences based on the associated cumulative return/rewards without treating them differently. Humans, on the contrary, often learn to…

Machine Learning · Computer Science 2025-11-25 Mingkang Wu , Devin White , Vernon Lawhern , Nicholas R. Waytowich , Yongcan Cao

Advances in Preference-based Reinforcement Learning: A Review

Reinforcement Learning (RL) algorithms suffer from the dependency on accurately engineered reward functions to properly guide the learning agents to do the required tasks. Preference-based reinforcement learning (PbRL) addresses that by…

Artificial Intelligence · Computer Science 2024-08-23 Youssef Abdelkareem , Shady Shehata , Fakhri Karray

Boosting Feedback Efficiency of Interactive Reinforcement Learning by Adaptive Learning from Scores

Interactive reinforcement learning has shown promise in learning complex robotic tasks. However, the process can be human-intensive due to the requirement of a large amount of interactive feedback. This paper presents a new method that uses…

Robotics · Computer Science 2023-08-08 Shukai Liu , Chenming Wu , Ying Li , Liangjun Zhang

Enhancing Rating-Based Reinforcement Learning to Effectively Leverage Feedback from Large Vision-Language Models

Designing effective reward functions remains a fundamental challenge in reinforcement learning (RL), as it often requires extensive human effort and domain expertise. While RL from human feedback has been successful in aligning agents with…

Machine Learning · Computer Science 2025-06-17 Tung Minh Luu , Younghwan Lee , Donghoon Lee , Sunho Kim , Min Jun Kim , Chang D. Yoo

Efficient Preference-Based Reinforcement Learning: Randomized Exploration Meets Experimental Design

We study reinforcement learning from human feedback in general Markov decision processes, where agents learn from trajectory-level preference comparisons. A central challenge in this setting is to design algorithms that select informative…

Machine Learning · Computer Science 2025-12-05 Andreas Schlaginhaufen , Reda Ouhamma , Maryam Kamgarpour

Navigating Noisy Feedback: Enhancing Reinforcement Learning with Error-Prone Language Models

The correct specification of reward models is a well-known challenge in reinforcement learning. Hand-crafted reward functions often lead to inefficient or suboptimal policies and may not be aligned with user values. Reinforcement learning…

Artificial Intelligence · Computer Science 2024-10-24 Muhan Lin , Shuyang Shi , Yue Guo , Behdad Chalaki , Vaishnav Tadiparthi , Ehsan Moradi Pari , Simon Stepputtis , Joseph Campbell , Katia Sycara

Models of human preference for learning reward functions

The utility of reinforcement learning is limited by the alignment of reward functions with the interests of human stakeholders. One promising method for alignment is to learn the reward function from human-generated preferences between…

Machine Learning · Computer Science 2023-09-08 W. Bradley Knox , Stephane Hatgis-Kessell , Serena Booth , Scott Niekum , Peter Stone , Alessandro Allievi

Opinion-Guided Reinforcement Learning

Human guidance is often desired in reinforcement learning to improve the performance of the learning agent. However, human insights are often mere opinions and educated guesses rather than well-formulated arguments. While opinions are…

Machine Learning · Computer Science 2024-08-06 Kyanna Dagenais , Istvan David

Preference-based Learning of Reward Function Features

Preference-based learning of reward functions, where the reward function is learned using comparison data, has been well studied for complex robotic tasks such as autonomous driving. Existing algorithms have focused on learning reward…

Robotics · Computer Science 2021-03-05 Sydney M. Katz , Amir Maleki , Erdem Bıyık , Mykel J. Kochenderfer

Capturing Individual Human Preferences with Reward Features

Reinforcement learning from human feedback usually models preferences using a reward function that does not distinguish between people. We argue that this is unlikely to be a good design choice in contexts with high potential for…

Artificial Intelligence · Computer Science 2026-02-20 André Barreto , Vincent Dumoulin , Yiran Mao , Mark Rowland , Nicolas Perez-Nieves , Bobak Shahriari , Yann Dauphin , Doina Precup , Hugo Larochelle

Learning Multimodal Rewards from Rankings

Learning from human feedback has shown to be a useful approach in acquiring robot reward functions. However, expert feedback is often assumed to be drawn from an underlying unimodal reward function. This assumption does not always hold…

Machine Learning · Computer Science 2021-10-20 Vivek Myers , Erdem Bıyık , Nima Anari , Dorsa Sadigh

Tell me why: Training preferences-based RL with human preferences and step-level explanations

Human-in-the-loop reinforcement learning allows the training of agents through various interfaces, even for non-expert humans. Recently, preference-based methods (PbRL), where the human has to give his preference over two trajectories,…

Artificial Intelligence · Computer Science 2024-08-06 Jakob Karalus

Reducing Reward Dependence in RL Through Adaptive Confidence Discounting

In human-in-the-loop reinforcement learning or environments where calculating a reward is expensive, the costly rewards can make learning efficiency challenging to achieve. The cost of obtaining feedback from humans or calculating expensive…

Machine Learning · Computer Science 2025-03-03 Muhammed Yusuf Satici , David L. Roberts

Robust Reward Alignment via Hypothesis Space Batch Cutting

Reward design in reinforcement learning and optimal control is challenging. Preference-based alignment addresses this by enabling agents to learn rewards from ranked trajectory pairs provided by humans. However, existing methods often…

Machine Learning · Computer Science 2025-05-29 Zhixian Xie , Haode Zhang , Yizhe Feng , Wanxin Jin