Related papers: Behavior Alignment via Reward Function Optimizatio…

Reward Design for Reinforcement Learning Agents

Reward functions are central in reinforcement learning (RL), guiding agents towards optimal decision-making. The complexity of RL tasks requires meticulously designed reward functions that effectively drive learning while avoiding…

Machine Learning · Computer Science 2025-03-31 Rati Devidze

Combining Automated Optimisation of Hyperparameters and Reward Shape

There has been significant progress in deep reinforcement learning (RL) in recent years. Nevertheless, finding suitable hyperparameter configurations and reward functions remains challenging even for experts, and performance heavily relies…

Machine Learning · Computer Science 2024-10-10 Julian Dierkes , Emma Cramer , Holger H. Hoos , Sebastian Trimpe

Programmatic Reward Design by Example

Reward design is a fundamental problem in reinforcement learning (RL). A misspecified or poorly designed reward can result in low sample efficiency and undesired behaviors. In this paper, we propose the idea of programmatic reward design,…

Machine Learning · Computer Science 2022-01-10 Weichao Zhou , Wenchao Li

REBEL: Reward Regularization-Based Approach for Robotic Reinforcement Learning from Human Feedback

The effectiveness of reinforcement learning (RL) agents in continuous control robotics tasks is mainly dependent on the design of the underlying reward function, which is highly prone to reward hacking. A misalignment between the reward…

Robotics · Computer Science 2025-01-22 Souradip Chakraborty , Anukriti Singh , Amisha Bhaskar , Pratap Tokekar , Dinesh Manocha , Amrit Singh Bedi

To the Max: Reinventing Reward in Reinforcement Learning

In reinforcement learning (RL), different reward functions can define the same optimal policy but result in drastically different learning performance. For some, the agent gets stuck with a suboptimal behavior, and for others, it solves the…

Machine Learning · Computer Science 2025-02-25 Grigorii Veviurko , Wendelin Böhmer , Mathijs de Weerdt

Towards better dense rewards in Reinforcement Learning Applications

Finding meaningful and accurate dense rewards is a fundamental task in the field of reinforcement learning (RL) that enables agents to explore environments more efficiently. In traditional RL settings, agents learn optimal policies through…

Artificial Intelligence · Computer Science 2025-12-05 Shuyuan Zhang

Reward Models in Deep Reinforcement Learning: A Survey

In reinforcement learning (RL), agents continually interact with the environment and use the feedback to refine their behavior. To guide policy optimization, reward models are introduced as proxies of the desired objectives, such that when…

Machine Learning · Computer Science 2025-06-19 Rui Yu , Shenghua Wan , Yucen Wang , Chen-Xiao Gao , Le Gan , Zongzhang Zhang , De-Chuan Zhan

Scalable agent alignment via reward modeling: a research direction

One obstacle to applying reinforcement learning algorithms to real-world problems is the lack of suitable reward functions. Designing such reward functions is difficult in part because the user only has an implicit understanding of the task…

Machine Learning · Computer Science 2018-11-20 Jan Leike , David Krueger , Tom Everitt , Miljan Martic , Vishal Maini , Shane Legg

Designing Rewards for Fast Learning

To convey desired behavior to a Reinforcement Learning (RL) agent, a designer must choose a reward function for the environment, arguably the most important knob designers have in interacting with RL agents. Although many reward functions…

Machine Learning · Computer Science 2022-06-01 Henry Sowerby , Zhiyuan Zhou , Michael L. Littman

Logic-based Reward Shaping for Multi-Agent Reinforcement Learning

Reinforcement learning (RL) relies heavily on exploration to learn from its environment and maximize observed rewards. Therefore, it is essential to design a reward function that guarantees optimal learning from the received experience.…

Artificial Intelligence · Computer Science 2022-06-20 Ingy ElSayed-Aly , Lu Feng

ALaRM: Align Language Models via Hierarchical Rewards Modeling

We introduce ALaRM, the first framework modeling hierarchical rewards in reinforcement learning from human feedback (RLHF), which is designed to enhance the alignment of large language models (LLMs) with human preferences. The framework…

Computation and Language · Computer Science 2024-03-19 Yuhang Lai , Siyuan Wang , Shujun Liu , Xuanjing Huang , Zhongyu Wei

Learning to Learn Group Alignment: A Self-Tuning Credo Framework with Multiagent Teams

Mixed incentives among a population with multiagent teams has been shown to have advantages over a fully cooperative system; however, discovering the best mixture of incentives or team structure is a difficult and dynamic problem. We…

Artificial Intelligence · Computer Science 2023-04-18 David Radke , Kyle Tilbury

Reward-Robust RLHF in LLMs

As Large Language Models (LLMs) continue to progress toward more advanced forms of intelligence, Reinforcement Learning from Human Feedback (RLHF) is increasingly seen as a key pathway toward achieving Artificial General Intelligence (AGI).…

Machine Learning · Computer Science 2024-10-17 Yuzi Yan , Xingzhou Lou , Jialian Li , Yiping Zhang , Jian Xie , Chao Yu , Yu Wang , Dong Yan , Yuan Shen

On Learning Intrinsic Rewards for Policy Gradient Methods

In many sequential decision making tasks, it is challenging to design reward functions that help an RL agent efficiently learn behavior that is considered good by the agent designer. A number of different formulations of the reward-design…

Artificial Intelligence · Computer Science 2018-06-25 Zeyu Zheng , Junhyuk Oh , Satinder Singh

VARP: Reinforcement Learning from Vision-Language Model Feedback with Agent Regularized Preferences

Designing reward functions for continuous-control robotics often leads to subtle misalignments or reward hacking, especially in complex tasks. Preference-based RL mitigates some of these pitfalls by learning rewards from comparative…

Artificial Intelligence · Computer Science 2025-03-19 Anukriti Singh , Amisha Bhaskar , Peihong Yu , Souradip Chakraborty , Ruthwik Dasyam , Amrit Bedi , Pratap Tokekar

HAF-RM: A Hybrid Alignment Framework for Reward Model Training

The reward model has become increasingly important in alignment, assessment, and data construction for large language models (LLMs). Most existing researchers focus on enhancing reward models through data improvements, following the…

Computation and Language · Computer Science 2025-01-09 Shujun Liu , Xiaoyu Shen , Yuhang Lai , Siyuan Wang , Shengbin Yue , Zengfeng Huang , Xuanjing Huang , Zhongyu Wei

Towards Improving Reward Design in RL: A Reward Alignment Metric for RL Practitioners

Reinforcement learning agents are fundamentally limited by the quality of the reward functions they learn from, yet reward design is often overlooked under the assumption that a well-defined reward is readily available. However, in…

Machine Learning · Computer Science 2025-07-28 Calarina Muslimani , Kerrick Johnstonbaugh , Suyog Chandramouli , Serena Booth , W. Bradley Knox , Matthew E. Taylor

Rethinking Reward Model Evaluation Through the Lens of Reward Overoptimization

Reward models (RMs) play a crucial role in reinforcement learning from human feedback (RLHF), aligning model behavior with human preferences. However, existing benchmarks for reward models show a weak correlation with the performance of…

Machine Learning · Computer Science 2025-05-20 Sunghwan Kim , Dongjin Kang , Taeyoon Kwon , Hyungjoo Chae , Dongha Lee , Jinyoung Yeo

Interpretable Preference-based Reinforcement Learning with Tree-Structured Reward Functions

The potential of reinforcement learning (RL) to deliver aligned and performant agents is partially bottlenecked by the reward engineering problem. One alternative to heuristic trial-and-error is preference-based RL (PbRL), where a reward…

Machine Learning · Computer Science 2021-12-22 Tom Bewley , Freddy Lecue

Reinforcement Learning for AMR Charging Decisions: The Impact of Reward and Action Space Design

We propose a novel reinforcement learning (RL) design to optimize the charging strategy for autonomous mobile robots in large-scale block stacking warehouses. RL design involves a wide array of choices that can mostly only be evaluated…

Artificial Intelligence · Computer Science 2025-05-19 Janik Bischoff , Alexandru Rinciog , Anne Meyer