Related papers: Programmatic Reward Design by Example

Reward Design for Reinforcement Learning Agents

Reward functions are central in reinforcement learning (RL), guiding agents towards optimal decision-making. The complexity of RL tasks requires meticulously designed reward functions that effectively drive learning while avoiding…

Machine Learning · Computer Science 2025-03-31 Rati Devidze

Towards better dense rewards in Reinforcement Learning Applications

Finding meaningful and accurate dense rewards is a fundamental task in the field of reinforcement learning (RL) that enables agents to explore environments more efficiently. In traditional RL settings, agents learn optimal policies through…

Artificial Intelligence · Computer Science 2025-12-05 Shuyuan Zhang

Comprehensive Overview of Reward Engineering and Shaping in Advancing Reinforcement Learning Applications

The aim of Reinforcement Learning (RL) in real-world applications is to create systems capable of making autonomous decisions by learning from their environment through trial and error. This paper emphasizes the importance of reward…

Machine Learning · Computer Science 2024-12-31 Sinan Ibrahim , Mostafa Mostafa , Ali Jnadi , Hadi Salloum , Pavel Osinenko

Reward Engineering for Reinforcement Learning in Software Tasks

Reinforcement learning is increasingly used for code-centric tasks. These tasks include code generation, summarization, understanding, repair, testing, and optimization. This trend is growing faster with large language models and autonomous…

Software Engineering · Computer Science 2026-01-28 Md Rayhanul Masud , Azmine Toushik Wasi , Salman Rahman , Md Rizwan Parvez

Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity

Reinforcement learning provides an automated framework for learning behaviors from high-level reward specifications, but in practice the choice of reward function can be crucial for good results -- while in principle the reward only needs…

Machine Learning · Computer Science 2022-10-19 Abhishek Gupta , Aldo Pacchiano , Yuexiang Zhai , Sham M. Kakade , Sergey Levine

Behavior Alignment via Reward Function Optimization

Designing reward functions for efficiently guiding reinforcement learning (RL) agents toward specific behaviors is a complex task. This is challenging since it requires the identification of reward structures that are not sparse and that…

Machine Learning · Computer Science 2023-11-01 Dhawal Gupta , Yash Chandak , Scott M. Jordan , Philip S. Thomas , Bruno Castro da Silva

Assisted Robust Reward Design

Real-world robotic tasks require complex reward functions. When we define the problem the robot needs to solve, we pretend that a designer specifies this complex reward exactly, and it is set in stone from then on. In practice, however,…

Robotics · Computer Science 2021-11-19 Jerry Zhi-Yang He , Anca D. Dragan

Inverse Reward Design

Autonomous agents optimize the reward function we give them. What they don't know is how hard it is for us to design a reward function that actually captures what we want. When designing the reward, we might think of some specific training…

Artificial Intelligence · Computer Science 2020-10-08 Dylan Hadfield-Menell , Smitha Milli , Pieter Abbeel , Stuart Russell , Anca Dragan

Designing Rewards for Fast Learning

To convey desired behavior to a Reinforcement Learning (RL) agent, a designer must choose a reward function for the environment, arguably the most important knob designers have in interacting with RL agents. Although many reward functions…

Machine Learning · Computer Science 2022-06-01 Henry Sowerby , Zhiyuan Zhou , Michael L. Littman

Effective Reward Specification in Deep Reinforcement Learning

In the last decade, Deep Reinforcement Learning has evolved into a powerful tool for complex sequential decision-making problems. It combines deep learning's proficiency in processing rich input signals with reinforcement learning's…

Machine Learning · Computer Science 2024-12-11 Julien Roy

On Learning Intrinsic Rewards for Policy Gradient Methods

In many sequential decision making tasks, it is challenging to design reward functions that help an RL agent efficiently learn behavior that is considered good by the agent designer. A number of different formulations of the reward-design…

Artificial Intelligence · Computer Science 2018-06-25 Zeyu Zheng , Junhyuk Oh , Satinder Singh

Sample Efficient Reinforcement Learning by Automatically Learning to Compose Subtasks

Improving sample efficiency is central to Reinforcement Learning (RL), especially in environments where the rewards are sparse. Some recent approaches have proposed to specify reward functions as manually designed or learned reward…

Machine Learning · Computer Science 2024-01-26 Shuai Han , Mehdi Dastani , Shihan Wang

Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification

Reinforcement learning (RL) algorithms assume that users specify tasks by manually writing down a reward function. However, this process can be laborious and demands considerable technical expertise. Can we devise RL algorithms that instead…

Machine Learning · Computer Science 2022-01-03 Benjamin Eysenbach , Sergey Levine , Ruslan Salakhutdinov

Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning

Reinforcement learning (RL) methods usually treat reward functions as black boxes. As such, these methods must extensively interact with the environment in order to discover rewards and optimal policies. In most RL applications, however,…

Machine Learning · Computer Science 2022-01-19 Rodrigo Toro Icarte , Toryn Q. Klassen , Richard Valenzano , Sheila A. McIlraith

Self-Refined Large Language Model as Automated Reward Function Designer for Deep Reinforcement Learning in Robotics

Although Deep Reinforcement Learning (DRL) has achieved notable success in numerous robotic applications, designing a high-performing reward function remains a challenging task that often requires substantial manual input. Recently, Large…

Robotics · Computer Science 2023-10-03 Jiayang Song , Zhehua Zhou , Jiawei Liu , Chunrong Fang , Zhan Shu , Lei Ma

Reward Models in Deep Reinforcement Learning: A Survey

In reinforcement learning (RL), agents continually interact with the environment and use the feedback to refine their behavior. To guide policy optimization, reward models are introduced as proxies of the desired objectives, such that when…

Machine Learning · Computer Science 2025-06-19 Rui Yu , Shenghua Wan , Yucen Wang , Chen-Xiao Gao , Le Gan , Zongzhang Zhang , De-Chuan Zhan

Automatic Reward Design via Learning Motivation-Consistent Intrinsic Rewards

Reward design is a critical part of the application of reinforcement learning, the performance of which strongly depends on how well the reward signal frames the goal of the designer and how well the signal assesses progress in reaching…

Machine Learning · Computer Science 2022-08-01 Yixiang Wang , Yujing Hu , Feng Wu , Yingfeng Chen

Reward Modeling for Reinforcement Learning-Based LLM Reasoning: Design, Challenges, and Evaluation

Large Language Models (LLMs) demonstrate transformative potential, yet their reasoning remains inconsistent and unreliable. Reinforcement learning (RL)-based fine-tuning is a key mechanism for improvement, but its effectiveness is…

Machine Learning · Computer Science 2026-02-11 Pei-Chi Pan , Yingbin Liang , Sen Lin

Defining Admissible Rewards for High Confidence Policy Evaluation

A key impediment to reinforcement learning (RL) in real applications with limited, batch data is defining a reward function that reflects what we implicitly know about reasonable behaviour for a task and allows for robust off-policy…

Machine Learning · Computer Science 2019-05-31 Niranjani Prasad , Barbara E Engelhardt , Finale Doshi-Velez

Reward Signal Design for Autonomous Racing

Reinforcement learning (RL) has shown to be a valuable tool in training neural networks for autonomous motion planning. The application of RL to a specific problem is dependent on a reward signal to quantify how good or bad a certain action…

Robotics · Computer Science 2024-10-28 Benjamin Evans , Herman A. Engelbrecht , Hendrik W. Jordaan