English
Related papers

Related papers: $\mathcal{B}$-Coder: Value-Based Deep Reinforcemen…

200 papers

Reinforcement Learning (RL) has emerged as a popular training paradigm, particularly when paired with reasoning models. While effective, it primarily focuses on generating responses and lacks mechanisms to explicitly foster critique or…

Computation and Language · Computer Science 2026-03-13 Chi Ruan , Dongfu Jiang , Yubo Wang , Wenhu Chen

Reinforcement learning (RL) algorithms assume that users specify tasks by manually writing down a reward function. However, this process can be laborious and demands considerable technical expertise. Can we devise RL algorithms that instead…

Machine Learning · Computer Science 2022-01-03 Benjamin Eysenbach , Sergey Levine , Ruslan Salakhutdinov

Most progress in recent coder models has been driven by supervised fine-tuning (SFT), while the potential of reinforcement learning (RL) remains largely unexplored, primarily due to the lack of reliable reward data/model in the code domain.…

Software Engineering · Computer Science 2025-05-27 Huaye Zeng , Dongfu Jiang , Haozhe Wang , Ping Nie , Xiaotong Chen , Wenhu Chen

Recently, deep reinforcement learning (DRL) methods have achieved impressive performance on tasks in a variety of domains. However, neural network policies produced with DRL methods are not human-interpretable and often have difficulty…

Machine Learning · Computer Science 2022-02-02 Dweep Trivedi , Jesse Zhang , Shao-Hua Sun , Joseph J. Lim

Program synthesis is the task of automatically generating a program consistent with a specification. Recent years have seen proposal of a number of neural approaches for program synthesis, many of which adopt a sequence generation paradigm…

Machine Learning · Computer Science 2018-05-23 Rudy Bunel , Matthew Hausknecht , Jacob Devlin , Rishabh Singh , Pushmeet Kohli

Program synthesis or code generation aims to generate a program that satisfies a problem specification. Recent approaches using large-scale pretrained language models (LMs) have shown promising results, yet they have some critical…

Machine Learning · Computer Science 2022-11-04 Hung Le , Yue Wang , Akhilesh Deepak Gotmare , Silvio Savarese , Steven C. H. Hoi

In practice, rigorous reasoning is often a key driver of correct code, while Reinforcement Learning (RL) for code generation often neglects optimizing reasoning quality. Bringing process-level supervision into RL is appealing, but it faces…

Software Engineering · Computer Science 2026-05-06 Lishui Fan , Yu Zhang , Mouxiang Chen , Zhongxin Liu

Improving data utilization efficiency is critical for scaling reinforcement learning (RL) for long-horizon tasks where generating trajectories is expensive. However, the dominant RL methods for LLMs are largely on-policy: they update each…

Reinforcement learning (RL) requires access to a reward function that incentivizes the right behavior, but these are notoriously hard to specify for complex tasks. Preference-based RL provides an alternative: learning policies using a…

Machine Learning · Computer Science 2021-11-05 Kimin Lee , Laura Smith , Anca Dragan , Pieter Abbeel

Trustworthy verifiers are essential for the success of reinforcement learning with verifiable reward (RLVR), which is the core methodology behind various large reasoning models such as DeepSeek-R1. In complex domains like mathematical…

Machine Learning · Computer Science 2025-10-08 Yuzhen Huang , Weihao Zeng , Xingshan Zeng , Qi Zhu , Junxian He

Reinforcement learning (RL) is a powerful machine learning technique that enables an intelligent agent to learn an optimal policy that maximizes the cumulative rewards in sequential decision making. Most of methods in the existing…

Machine Learning · Statistics 2023-01-06 Chengchun Shi , Zhengling Qi , Jianing Wang , Fan Zhou

Reinforcement learning (RL) with unit test feedback has enhanced large language models' (LLMs) code generation, but relies on sparse rewards provided only after complete code evaluation, limiting learning efficiency and incremental…

Artificial Intelligence · Computer Science 2025-02-05 Ning Dai , Zheng Wu , Renjie Zheng , Ziyun Wei , Wenlei Shi , Xing Jin , Guanlin Liu , Chen Dun , Liang Huang , Lin Yan

Existing reinforcement learning strategies based on outcome supervision have proven effective in enhancing the performance of large language models(LLMs) for code generation. While reinforcement learning based on process supervision has…

Software Engineering · Computer Science 2025-02-05 Yufan Ye , Ting Zhang , Wenbin Jiang , Hua Huang

Recently DeepSeek R1 has shown that reinforcement learning (RL) can substantially improve the reasoning capabilities of Large Language Models (LLMs) through a simple yet effective design. The core of R1 lies in its rule-based reward…

Computer Vision and Pattern Recognition · Computer Science 2025-04-15 Haozhan Shen , Peng Liu , Jingcheng Li , Chunxin Fang , Yibo Ma , Jiajia Liao , Qiaoli Shen , Zilun Zhang , Kangjia Zhao , Qianqian Zhang , Ruochen Xu , Tiancheng Zhao

Mapping natural language instructions to programs that computers can process is a fundamental challenge. Existing approaches focus on likelihood-based training or using reinforcement learning to fine-tune models based on a single reward. In…

Computation and Language · Computer Science 2021-10-05 Sayan Ghosh , Shashank Srivastava

Code-generating Large Language Models (LLMs) have become essential tools in modern software development, enhancing productivity and accelerating development. This paper aims to investigate the fine-tuning of code-generating LLMs using…

Software Engineering · Computer Science 2025-05-06 Marina Sakharova , Abhinav Anand , Mira Mezini

Large language models show strong potential for automated code generation, but lack guarantees for correctness, quality, safety, and domain-specific constraints. For instance in robotics, where code generation is increasingly being used for…

Machine Learning · Computer Science 2026-05-21 Erfan Aghadavoodi Jolfaei , Daniel Maninger , Abhinav Anand , Mert Tiftikci , Mira Mezini

The automatic synthesis of a policy through reinforcement learning (RL) from a given set of formal requirements depends on the construction of a reward signal and consists of the iterative application of many policy-improvement steps. The…

Machine Learning · Computer Science 2022-10-21 Luigi Berducci , Radu Grosu

Pre-trained Vision-Language Models (VLMs) are able to understand visual concepts, describe and decompose complex tasks into sub-tasks, and provide feedback on task completion. In this paper, we aim to leverage these capabilities to support…

Machine Learning · Computer Science 2024-02-08 David Venuto , Sami Nur Islam , Martin Klissarov , Doina Precup , Sherry Yang , Ankit Anand

Protein sequence design, determined by amino acid sequences, are essential to protein engineering problems in drug discovery. Prior approaches have resorted to evolutionary strategies or Monte-Carlo methods for protein design, but often…

‹ Prev 1 2 3 10 Next ›