English
Related papers

Related papers: Programming by Rewards

200 papers

Reward design is a fundamental problem in reinforcement learning (RL). A misspecified or poorly designed reward can result in low sample efficiency and undesired behaviors. In this paper, we propose the idea of programmatic reward design,…

Machine Learning · Computer Science 2022-01-10 Weichao Zhou , Wenchao Li

We consider the task of program synthesis in the presence of a reward function over the output of programs, where the goal is to find programs with maximal rewards. We employ an iterative optimization scheme, where we train an RNN on a…

Artificial Intelligence · Computer Science 2018-03-28 Daniel A. Abolafia , Mohammad Norouzi , Jonathan Shen , Rui Zhao , Quoc V. Le

The use of Potential-Based Reward Shaping (PBRS) has shown great promise in the ongoing research effort to tackle sample inefficiency in Reinforcement Learning (RL). However, choosing the right potential function remains an open challenge.…

Machine Learning · Computer Science 2025-08-12 Giuseppe Canonaco , Leo Ardon , Alberto Pozanco , Daniel Borrajo

Interacting with computers is a ubiquitous activity for millions of people. Repetitive or specialized tasks often require creation of small, often one-off, programs. End-users struggle with learning and using the myriad of domain-specific…

Programming Languages · Computer Science 2015-09-02 Aditya Desai , Sumit Gulwani , Vineet Hingorani , Nidhi Jain , Amey Karkare , Mark Marron , Sailesh R , Subhajit Roy

Preference-based reinforcement learning (PbRL) is an approach that enables RL agents to learn from preference, which is particularly useful when formulating a reward function is challenging. Existing PbRL methods generally involve a…

Machine Learning · Computer Science 2023-10-30 Gaon An , Junhyeok Lee , Xingdong Zuo , Norio Kosaka , Kyung-Min Kim , Hyun Oh Song

Human-designed reward functions for reinforcement learning (RL) agents are frequently misaligned with the humans' true, unobservable objectives, and thus act only as proxies. Optimizing for a misspecified proxy reward function often induces…

Artificial Intelligence · Computer Science 2026-01-30 Stephane Hatgis-Kessell , Logan Mondal Bhamidipaty , Emma Brunskill

Programming-by-example is the task of synthesizing a program that is consistent with a set of user-provided input-output examples. As examples are often an under-specification of one's intent, a good synthesizer must choose the intended…

Machine Learning · Computer Science 2025-04-18 Saujas Vaduguru , Daniel Fried , Yewen Pu

The potential of reinforcement learning (RL) to deliver aligned and performant agents is partially bottlenecked by the reward engineering problem. One alternative to heuristic trial-and-error is preference-based RL (PbRL), where a reward…

Machine Learning · Computer Science 2021-12-22 Tom Bewley , Freddy Lecue

Decision trees, owing to their interpretability, are attractive as control policies for (dynamical) systems. Unfortunately, constructing, or synthesising, such policies is a challenging task. Previous approaches do so by imitating a…

Artificial Intelligence · Computer Science 2025-04-23 Emir Demirović , Christian Schilling , Anna Lukina

Preference-based reinforcement learning (PbRL) aligns a robot behavior with human preferences via a reward function learned from binary feedback over agent behaviors. We show that dynamics-aware reward functions improve the sample…

Artificial Intelligence · Computer Science 2024-02-29 Katherine Metcalf , Miguel Sarabia , Natalie Mackraz , Barry-John Theobald

Programming-by-example (PBE) is a synthesis paradigm that allows users to generate functions by simply providing input-output examples. While a promising interaction paradigm, synthesis is still too slow for realtime interaction and more…

Machine Learning · Computer Science 2020-02-10 Kairo Morton , William Hallahan , Elven Shum , Ruzica Piskac , Mark Santolucito

Offline imitation learning (offline IL) enables training effective policies without requiring explicit reward annotations. Recent approaches attempt to estimate rewards for unlabeled datasets using a small set of expert demonstrations.…

Machine Learning · Computer Science 2025-11-19 Shengjie Sun , Jiafei Lyu , Runze Liu , Mengbei Yan , Bo Liu , Deheng Ye , Xiu Li

Process Reward Models (PRMs) have emerged as a promising approach to enhance the reasoning capabilities of large language models (LLMs) by guiding their step-by-step reasoning toward a final answer. However, existing PRMs either treat each…

Machine Learning · Computer Science 2026-03-02 Zheng Zhang , Ziwei Shan , Kaitao Song , Yexin Li , Kan Ren

LLMs have shown impressive success in program synthesis, discovering programs that surpass prior solutions. However, these approaches rely on simple numeric scores to signal program quality, such as the value of the solution or the number…

Artificial Intelligence · Computer Science 2026-05-19 André G. Pereira , Augusto B. Corrêa , Jendrik Seipp

Reinforcement learning is increasingly used for code-centric tasks. These tasks include code generation, summarization, understanding, repair, testing, and optimization. This trend is growing faster with large language models and autonomous…

Software Engineering · Computer Science 2026-01-28 Md Rayhanul Masud , Azmine Toushik Wasi , Salman Rahman , Md Rizwan Parvez

Mathematical reasoning in large language models has improved substantially with reinforcement learning using verifiable rewards, where final answers can be checked automatically and converted into reliable training signals. Most such…

Machine Learning · Computer Science 2026-04-06 Mohammad Rezaei , Jens Lehmann , Sahar Vahdati

Program Synthesis is the task of generating a program from a provided specification. Traditionally, this has been treated as a search problem by the programming languages (PL) community and more recently as a supervised learning problem by…

Artificial Intelligence · Computer Science 2018-06-11 Riley Simmons-Edler , Anders Miltner , Sebastian Seung

We introduce Probabilistic Rank and Reward (PRR), a scalable probabilistic model for personalized slate recommendation. Our approach allows off-policy estimation of the reward in the scenario where the user interacts with at most one item…

Information Retrieval · Computer Science 2024-07-08 Imad Aouali , Achraf Ait Sidi Hammou , Otmane Sakhi , David Rohde , Flavian Vasile

There has been significant progress in deep reinforcement learning (RL) in recent years. Nevertheless, finding suitable hyperparameter configurations and reward functions remains challenging even for experts, and performance heavily relies…

Machine Learning · Computer Science 2024-10-10 Julian Dierkes , Emma Cramer , Holger H. Hoos , Sebastian Trimpe

Programming by Example (PBE) is the task of inducing computer programs from input-output examples. It can be seen as a type of machine learning where the hypothesis space is the set of legal programs in some programming language. Recent…

Programming Languages · Computer Science 2017-03-03 John K. Feser , Marc Brockschmidt , Alexander L. Gaunt , Daniel Tarlow
‹ Prev 1 2 3 10 Next ›