Related papers: IRCoCo: Immediate Rewards-Guided Deep Reinforcemen…

Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization

Large Language Models (LLMs) generate functionally correct solutions but often fall short in code efficiency, a critical bottleneck for real-world deployment. In this paper, we introduce a novel test-time iterative optimization framework to…

Software Engineering · Computer Science 2025-06-04 Mingzhe Du , Luu Anh Tuan , Yue Liu , Yuhao Qing , Dong Huang , Xinyi He , Qian Liu , Zejun Ma , See-kiong Ng

Reinforcement Learning for LLM Post-Training: A Survey

Large language models (LLMs) trained via pretraining and supervised fine-tuning (SFT) can still produce harmful and misaligned outputs, or struggle in domains like math and coding. Reinforcement learning (RL)-based post-training methods,…

Computation and Language · Computer Science 2026-05-19 Zhichao Wang , Kiran Ramnath , Bin Bi , Shiva Kumar Pentyala , Sougata Chaudhuri , Shubham Mehrotra , Zixu , Zhu , Xiang-Bo Mao , Sitaram Asur , Na , Cheng

ReST-RL: Achieving Accurate Code Reasoning of LLMs with Optimized Self-Training and Decoding

With respect to improving the reasoning accuracy of LLMs, the representative reinforcement learning (RL) method GRPO faces failure due to insignificant reward variance, while verification methods based on process reward models (PRMs) suffer…

Artificial Intelligence · Computer Science 2025-09-09 Sining Zhoubian , Dan Zhang , Jie Tang

Reinforced Latent Reasoning for LLM-based Recommendation

Large Language Models (LLMs) have demonstrated impressive reasoning capabilities in complex problem-solving tasks, sparking growing interest in their application to preference reasoning in recommendation systems. Existing methods typically…

Artificial Intelligence · Computer Science 2025-10-27 Yang Zhang , Wenxin Xu , Xiaoyan Zhao , Wenjie Wang , Fuli Feng , Xiangnan He , Tat-Seng Chua

Integrating Symbolic Execution into the Fine-Tuning of Code-Generating LLMs

Code-generating Large Language Models (LLMs) have become essential tools in modern software development, enhancing productivity and accelerating development. This paper aims to investigate the fine-tuning of code-generating LLMs using…

Software Engineering · Computer Science 2025-05-06 Marina Sakharova , Abhinav Anand , Mira Mezini

Reward Is Enough: LLMs Are In-Context Reinforcement Learners

Reinforcement learning (RL) is a framework for solving sequential decision-making problems. In this work, we demonstrate that, surprisingly, RL emerges during the inference time of large language models (LLMs), a phenomenon we term…

Machine Learning · Computer Science 2026-04-28 Kefan Song , Amir Moeini , Peng Wang , Lei Gong , Rohan Chandra , Shangtong Zhang , Yanjun Qi

Reinforcement Learning Enhanced LLMs: A Survey

Reinforcement learning (RL) enhanced large language models (LLMs), particularly exemplified by DeepSeek-R1, have exhibited outstanding performance. Despite the effectiveness in improving LLM capabilities, its implementation remains highly…

Computation and Language · Computer Science 2025-02-25 Shuhe Wang , Shengyu Zhang , Jie Zhang , Runyi Hu , Xiaoya Li , Tianwei Zhang , Jiwei Li , Fei Wu , Guoyin Wang , Eduard Hovy

Improving Medical Reasoning with Curriculum-Aware Reinforcement Learning

Recent advances in reinforcement learning with verifiable, rule-based rewards have greatly enhanced the reasoning capabilities and out-of-distribution generalization of VLMs/LLMs, obviating the need for manually crafted reasoning chains.…

Artificial Intelligence · Computer Science 2025-05-27 Shaohao Rui , Kaitao Chen , Weijie Ma , Xiaosong Wang

Inverse Reinforcement Learning with Dynamic Reward Scaling for LLM Alignment

Alignment is vital for safely deploying large language models (LLMs). Existing techniques are either reward-based (training a reward model on preference pairs and optimizing with reinforcement learning) or reward-free (directly fine-tuning…

Computation and Language · Computer Science 2026-03-03 Ruoxi Cheng , Haoxuan Ma , Weixin Wang , Ranjie Duan , Jiexi Liu , Xiaoshuang Jia , Simeng Qin , Xiaochun Cao , Yang Liu , Xiaojun Jia

Dataflow-Guided Retrieval Augmentation for Repository-Level Code Completion

Recent years have witnessed the deployment of code language models (LMs) in various code intelligence tasks such as code completion. Yet, it is challenging for pre-trained LMs to generate correct completions in private repositories.…

Software Engineering · Computer Science 2024-05-31 Wei Cheng , Yuhan Wu , Wei Hu

Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining

Reinforcement learning (RL)-based fine-tuning has become a crucial step in post-training language models for advanced mathematical reasoning and coding. Following the success of frontier reasoning models, recent work has demonstrated that…

Machine Learning · Computer Science 2025-08-11 Rosie Zhao , Alexandru Meterez , Sham Kakade , Cengiz Pehlevan , Samy Jelassi , Eran Malach

Aligning Large Vision-Language Models by Deep Reinforcement Learning and Direct Preference Optimization

Large Vision-Language Models (LVLMs) or multimodal large language models represent a significant advancement in artificial intelligence, enabling systems to understand and generate content across both visual and textual modalities. While…

Machine Learning · Computer Science 2025-09-09 Thanh Thi Nguyen , Campbell Wilson , Janis Dalins

Reinforcement Learning Fine-Tuning Enhances Activation Intensity and Diversity in the Internal Circuitry of LLMs

Large language models (LLMs) acquire extensive prior knowledge through large-scale pretraining and can be further enhanced via supervised fine-tuning (SFT) or reinforcement learning (RL)-based post-training. A growing body of evidence has…

Machine Learning · Computer Science 2026-01-28 Honglin Zhang , Qianyue Hao , Fengli Xu , Yong Li

Stable Reinforcement Learning for Efficient Reasoning

The success of Deepseek-R1 has drawn the LLM community's attention to reinforcement learning (RL) methods like GRPO. However, such rule-based 0/1 outcome reward methods lack the capability to regulate the intermediate reasoning processes…

Artificial Intelligence · Computer Science 2025-05-26 Muzhi Dai , Shixuan Liu , Qingyi Si

Deep Learning-based Code Completion: On the Impact on Performance of Contextual Information

Code completion aims at speeding up code writing by recommending to developers the next tokens they are likely to type. Deep Learning (DL) models pushed the boundaries of code completion by redefining what these coding assistants can do: We…

Software Engineering · Computer Science 2025-01-10 Matteo Ciniselli , Luca Pascarella , Gabriele Bavota

Reinforcement learning fine-tuning of language model for instruction following and math reasoning

This study investigates the effectiveness of reinforcement learning (RL) fine-tuning techniques on a compact language model (Qwen2.5-0.5B Base) for two challenging tasks: instruction following and mathematical reasoning. We compare…

Computation and Language · Computer Science 2025-07-29 Yifu Han , Geo Zhang

Effective Reinforcement Learning for Reasoning in Language Models

Reinforcement learning (RL) has emerged as a promising strategy for improving the reasoning capabilities of language models (LMs) in domains such as mathematics and coding. However, most modern RL algorithms were designed to target robotics…

Artificial Intelligence · Computer Science 2025-05-26 Lianghuan Huang , Shuo Li , Sagnik Anupam , Insup Lee , Osbert Bastani

A Technical Survey of Reinforcement Learning Techniques for Large Language Models

Reinforcement Learning (RL) has emerged as a transformative approach for aligning and enhancing Large Language Models (LLMs), addressing critical challenges in instruction following, ethical alignment, and reasoning capabilities. This…

Artificial Intelligence · Computer Science 2025-07-08 Saksham Sahai Srivastava , Vaneet Aggarwal

CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment

While Large Language Models (LLMs) excel at code generation by learning from vast code corpora, a fundamental semantic gap remains between their training on textual patterns and the goal of functional correctness, which is governed by…

Software Engineering · Computer Science 2026-04-23 Xue Jiang , Yihong Dong , Mengyang Liu , Hongyi Deng , Tian Wang , Yongding Tao , Rongyu Cao , Binhua Li , Zhi Jin , Wenpin Jiao , Fei Huang , Yongbin Li , Ge Li

Adaptive Reward Design for Reinforcement Learning

There is a surge of interest in using formal languages such as Linear Temporal Logic (LTL) to precisely and succinctly specify complex tasks and derive reward functions for Reinforcement Learning (RL). However, existing methods often assign…

Robotics · Computer Science 2025-05-20 Minjae Kwon , Ingy ElSayed-Aly , Lu Feng