Related papers: Programming by Rewards

Programmatic Reward Design by Example

Reward design is a fundamental problem in reinforcement learning (RL). A misspecified or poorly designed reward can result in low sample efficiency and undesired behaviors. In this paper, we propose the idea of programmatic reward design,…

Machine Learning · Computer Science 2022-01-10 Weichao Zhou , Wenchao Li

Neural Program Synthesis with Priority Queue Training

We consider the task of program synthesis in the presence of a reward function over the output of programs, where the goal is to find programs with maximal rewards. We employ an iterative optimization scheme, where we train an RNN on a…

Artificial Intelligence · Computer Science 2018-03-28 Daniel A. Abolafia , Mohammad Norouzi , Jonathan Shen , Rui Zhao , Quoc V. Le

On the Sample Efficiency of Abstractions and Potential-Based Reward Shaping in Reinforcement Learning

The use of Potential-Based Reward Shaping (PBRS) has shown great promise in the ongoing research effort to tackle sample inefficiency in Reinforcement Learning (RL). However, choosing the right potential function remains an open challenge.…

Machine Learning · Computer Science 2025-08-12 Giuseppe Canonaco , Leo Ardon , Alberto Pozanco , Daniel Borrajo

Program Synthesis using Natural Language

Interacting with computers is a ubiquitous activity for millions of people. Repetitive or specialized tasks often require creation of small, often one-off, programs. End-users struggle with learning and using the myriad of domain-specific…

Programming Languages · Computer Science 2015-09-02 Aditya Desai , Sumit Gulwani , Vineet Hingorani , Nidhi Jain , Amey Karkare , Mark Marron , Sailesh R , Subhajit Roy

Direct Preference-based Policy Optimization without Reward Modeling

Preference-based reinforcement learning (PbRL) is an approach that enables RL agents to learn from preference, which is particularly useful when formulating a reward function is challenging. Existing PbRL methods generally involve a…

Machine Learning · Computer Science 2023-10-30 Gaon An , Junhyeok Lee , Xingdong Zuo , Norio Kosaka , Kyung-Min Kim , Hyun Oh Song

Repairing Reward Functions with Feedback to Mitigate Reward Hacking

Human-designed reward functions for reinforcement learning (RL) agents are frequently misaligned with the humans' true, unobservable objectives, and thus act only as proxies. Optimizing for a misspecified proxy reward function often induces…

Artificial Intelligence · Computer Science 2026-01-30 Stephane Hatgis-Kessell , Logan Mondal Bhamidipaty , Emma Brunskill

Generating Pragmatic Examples to Train Neural Program Synthesizers

Programming-by-example is the task of synthesizing a program that is consistent with a set of user-provided input-output examples. As examples are often an under-specification of one's intent, a good synthesizer must choose the intended…

Machine Learning · Computer Science 2025-04-18 Saujas Vaduguru , Daniel Fried , Yewen Pu

Interpretable Preference-based Reinforcement Learning with Tree-Structured Reward Functions

The potential of reinforcement learning (RL) to deliver aligned and performant agents is partially bottlenecked by the reward engineering problem. One alternative to heuristic trial-and-error is preference-based RL (PbRL), where a reward…

Machine Learning · Computer Science 2021-12-22 Tom Bewley , Freddy Lecue

In Search of Trees: Decision-Tree Policy Synthesis for Black-Box Systems via Search

Decision trees, owing to their interpretability, are attractive as control policies for (dynamical) systems. Unfortunately, constructing, or synthesising, such policies is a challenging task. Previous approaches do so by imitating a…

Artificial Intelligence · Computer Science 2025-04-23 Emir Demirović , Christian Schilling , Anna Lukina

Sample-Efficient Preference-based Reinforcement Learning with Dynamics Aware Rewards

Preference-based reinforcement learning (PbRL) aligns a robot behavior with human preferences via a reward function learned from binary feedback over agent behaviors. We show that dynamics-aware reward functions improve the sample…

Artificial Intelligence · Computer Science 2024-02-29 Katherine Metcalf , Miguel Sarabia , Natalie Mackraz , Barry-John Theobald

Grammar Filtering For Syntax-Guided Synthesis

Programming-by-example (PBE) is a synthesis paradigm that allows users to generate functions by simply providing input-output examples. While a promising interaction paradigm, synthesis is still too slow for realtime interaction and more…

Machine Learning · Computer Science 2020-02-10 Kairo Morton , William Hallahan , Elven Shum , Ruzica Piskac , Mark Santolucito

PROF: An LLM-based Reward Code Preference Optimization Framework for Offline Imitation Learning

Offline imitation learning (offline IL) enables training effective policies without requiring explicit reward annotations. Recent approaches attempt to estimate rewards for unlabeled datasets using a small set of expert demonstrations.…

Machine Learning · Computer Science 2025-11-19 Shengjie Sun , Jiafei Lyu , Runze Liu , Mengbei Yan , Bo Liu , Deheng Ye , Xiu Li

Linking Process to Outcome: Conditional Reward Modeling for LLM Reasoning

Process Reward Models (PRMs) have emerged as a promising approach to enhance the reasoning capabilities of large language models (LLMs) by guiding their step-by-step reasoning toward a final answer. However, existing PRMs either treat each…

Machine Learning · Computer Science 2026-03-02 Zheng Zhang , Ziwei Shan , Kaitao Song , Yexin Li , Kan Ren

Property-Guided LLM Program Synthesis for Planning

LLMs have shown impressive success in program synthesis, discovering programs that surpass prior solutions. However, these approaches rely on simple numeric scores to signal program quality, such as the value of the solution or the number…

Artificial Intelligence · Computer Science 2026-05-19 André G. Pereira , Augusto B. Corrêa , Jendrik Seipp

Reward Engineering for Reinforcement Learning in Software Tasks

Reinforcement learning is increasingly used for code-centric tasks. These tasks include code generation, summarization, understanding, repair, testing, and optimization. This trend is growing faster with large language models and autonomous…

Software Engineering · Computer Science 2026-01-28 Md Rayhanul Masud , Azmine Toushik Wasi , Salman Rahman , Md Rizwan Parvez

LLM Reasoning with Process Rewards for Outcome-Guided Steps

Mathematical reasoning in large language models has improved substantially with reinforcement learning using verifiable rewards, where final answers can be checked automatically and converted into reliable training signals. Most such…

Machine Learning · Computer Science 2026-04-06 Mohammad Rezaei , Jens Lehmann , Sahar Vahdati

Program Synthesis Through Reinforcement Learning Guided Tree Search

Program Synthesis is the task of generating a program from a provided specification. Traditionally, this has been treated as a search problem by the programming languages (PL) community and more recently as a supervised learning problem by…

Artificial Intelligence · Computer Science 2018-06-11 Riley Simmons-Edler , Anders Miltner , Sebastian Seung

Probabilistic Rank and Reward: A Scalable Model for Slate Recommendation

We introduce Probabilistic Rank and Reward (PRR), a scalable probabilistic model for personalized slate recommendation. Our approach allows off-policy estimation of the reward in the scenario where the user interacts with at most one item…

Information Retrieval · Computer Science 2024-07-08 Imad Aouali , Achraf Ait Sidi Hammou , Otmane Sakhi , David Rohde , Flavian Vasile

Combining Automated Optimisation of Hyperparameters and Reward Shape

There has been significant progress in deep reinforcement learning (RL) in recent years. Nevertheless, finding suitable hyperparameter configurations and reward functions remains challenging even for experts, and performance heavily relies…

Machine Learning · Computer Science 2024-10-10 Julian Dierkes , Emma Cramer , Holger H. Hoos , Sebastian Trimpe

Differentiable Functional Program Interpreters

Programming by Example (PBE) is the task of inducing computer programs from input-output examples. It can be seen as a type of machine learning where the hypothesis space is the set of legal programs in some programming language. Recent…

Programming Languages · Computer Science 2017-03-03 John K. Feser , Marc Brockschmidt , Alexander L. Gaunt , Daniel Tarlow