Related papers: Self-Execution Simulation Improves Coding Models

Integrating Symbolic Execution into the Fine-Tuning of Code-Generating LLMs

Code-generating Large Language Models (LLMs) have become essential tools in modern software development, enhancing productivity and accelerating development. This paper aims to investigate the fine-tuning of code-generating LLMs using…

Software Engineering · Computer Science 2025-05-06 Marina Sakharova , Abhinav Anand , Mira Mezini

Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning Code LLMs

Code LLMs have shown promising results with converting tasks in natural language to programs that can be executed by service robots. We are interested in finetuning small, specialized LLMs for this purpose, but collecting datasets of…

Computation and Language · Computer Science 2025-10-13 Zichao Hu , Junyi Jessy Li , Arjun Guha , Joydeep Biswas

Can Large Language Models Solve Path Constraints in Symbolic Execution?

Symbolic execution is an important software analysis technique which benefits downstream tasks such as software testing and debugging. However, several limitations hinder symbolic execution from application on real-world software. One of…

Software Engineering · Computer Science 2025-11-25 Wenhan Wang , Kaibo Liu , Zeyu Sun , An Ran Chen , Ge Li , Gang Huang , Lei Ma

RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning

Large language models (LLMs) deployed as agents solve user-specified tasks over multiple steps while keeping the required manual engagement to a minimum. Crucially, such LLMs need to ground their generations in any feedback obtained to…

Computation and Language · Computer Science 2025-02-19 Jonas Gehring , Kunhao Zheng , Jade Copet , Vegard Mella , Quentin Carbonneaux , Taco Cohen , Gabriel Synnaeve

Language Models Can Teach Themselves to Program Better

Recent Language Models (LMs) achieve breakthrough performance in code generation when trained on human-authored problems, even solving some competitive-programming problems. Self-play has proven useful in games such as Go, and thus it is…

Machine Learning · Computer Science 2023-04-13 Patrick Haluptzok , Matthew Bowers , Adam Tauman Kalai

Executing Natural Language-Described Algorithms with Large Language Models: An Investigation

Executing computer programs described in natural language has long been a pursuit of computer science. With the advent of enhanced natural language understanding capabilities exhibited by large language models (LLMs), the path toward this…

Computation and Language · Computer Science 2024-03-15 Xin Zheng , Qiming Zhu , Hongyu Lin , Yaojie Lu , Xianpei Han , Le Sun

Investigating Execution-Aware Language Models for Code Optimization

Code optimization is the process of enhancing code efficiency, while preserving its intended functionality. This process often requires a deep understanding of the code execution behavior at run-time to identify and address inefficiencies…

Software Engineering · Computer Science 2026-04-02 Federico Di Menna , Luca Traini , Gabriele Bavota , Vittorio Cortellessa

What I cannot execute, I do not understand: Training and Evaluating LLMs on Program Execution Traces

Code generation and understanding are critical capabilities for large language models (LLMs). Thus, most LLMs are pretrained and fine-tuned on code data. However, these datasets typically treat code as static strings and rarely exploit the…

Machine Learning · Computer Science 2025-03-11 Jordi Armengol-Estapé , Quentin Carbonneaux , Tianjun Zhang , Aram H. Markosyan , Volker Seeker , Chris Cummins , Melanie Kambadur , Michael F. P. O'Boyle , Sida Wang , Gabriel Synnaeve , Hugh James Leather

DOCE: Finding the Sweet Spot for Execution-Based Code Generation

Recently, a diverse set of decoding and reranking procedures have been shown effective for LLM-based code generation. However, a comprehensive framework that links and experimentally compares these methods is missing. We address this by…

Computation and Language · Computer Science 2024-10-17 Haau-Sing Li , Patrick Fernandes , Iryna Gurevych , André F. T. Martins

Code Execution as Grounded Supervision for LLM Reasoning

Training large language models (LLMs) with chain-of-thought (CoT) supervision has proven effective for enhancing their reasoning abilities. However, obtaining reliable and accurate reasoning supervision remains a significant challenge. We…

Computation and Language · Computer Science 2025-10-21 Dongwon Jung , Wenxuan Zhou , Muhao Chen

Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models

Algorithmic reasoning refers to the ability to understand the complex patterns behind the problem and decompose them into a sequence of reasoning steps towards the solution. Such nature of algorithmic reasoning makes it a challenge for…

Computation and Language · Computer Science 2024-04-04 Hyungjoo Chae , Yeonghyeon Kim , Seungone Kim , Kai Tzu-iunn Ong , Beong-woo Kwak , Moohyeon Kim , Seonghwan Kim , Taeyoon Kwon , Jiwan Chung , Youngjae Yu , Jinyoung Yeo

Code Simulation Challenges for Large Language Models

Many reasoning, planning, and problem-solving tasks share an intrinsic algorithmic nature: correctly simulating each step is a sufficient condition to solve them correctly. This work studies to what extent Large Language Models (LLMs) can…

Machine Learning · Computer Science 2024-06-13 Emanuele La Malfa , Christoph Weinhuber , Orazio Torre , Fangru Lin , Samuele Marro , Anthony Cohn , Nigel Shadbolt , Michael Wooldridge

Learning Autocompletion from Real-World Datasets

Code completion is a popular software development tool integrated into all major IDEs. Many neural language models have achieved promising results in completion suggestion prediction on synthetic benchmarks. However, a recent study When…

Software Engineering · Computer Science 2020-11-10 Gareth Ari Aye , Seohyun Kim , Hongyu Li

Reinforcing Code Generation: Improving Text-to-SQL with Execution-Based Learning

In this work, we study the problem of code generation with a large language model (LLM), with a focus on generating SQL queries from natural language questions. We ask: Instead of using supervised fine tuning with text-code pairs, can we…

Computation and Language · Computer Science 2025-06-09 Atharv Kulkarni , Vivek Srikumar

NExT: Teaching Large Language Models to Reason about Code Execution

A fundamental skill among human developers is the ability to understand and reason about program execution. As an example, a programmer can mentally simulate code execution in natural language to debug and repair code (aka. rubber duck…

Machine Learning · Computer Science 2024-04-24 Ansong Ni , Miltiadis Allamanis , Arman Cohan , Yinlin Deng , Kensen Shi , Charles Sutton , Pengcheng Yin

The Self-Execution Benchmark: Measuring LLMs' Attempts to Overcome Their Lack of Self-Execution

Large language models (LLMs) are commonly evaluated on tasks that test their knowledge or reasoning abilities. In this paper, we explore a different type of evaluation: whether an LLM can predict aspects of its own responses. Since LLMs…

Computation and Language · Computer Science 2025-08-19 Elon Ezra , Ariel Weizman , Amos Azaria

Learning Performance-Improving Code Edits

With the decline of Moore's law, optimizing program performance has become a major focus of software research. However, high-level optimizations such as API and algorithm changes remain elusive due to the difficulty of understanding the…

Software Engineering · Computer Science 2024-04-29 Alexander Shypula , Aman Madaan , Yimeng Zeng , Uri Alon , Jacob Gardner , Milad Hashemi , Graham Neubig , Parthasarathy Ranganathan , Osbert Bastani , Amir Yazdanbakhsh

Revisit Self-Debugging with Self-Generated Tests for Code Generation

Large language models (LLMs) have shown significant advancements in code generation, but still face challenges on tasks beyond their basic capabilities. Recently, the notion of self-debugging has been proposed to boost the performance of…

Software Engineering · Computer Science 2025-01-23 Xiancai Chen , Zhengwei Tao , Kechi Zhang , Changzhi Zhou , Wanli Gu , Yuanpeng He , Mengdi Zhang , Xunliang Cai , Haiyan Zhao , Zhi Jin

MaxCode: A Max-Reward Reinforcement Learning Framework for Automated Code Optimization

Large Language Models (LLMs) demonstrate strong capabilities in general coding tasks but encounter two key challenges when optimizing code: (i) the complexity of writing optimized code (such as performant CUDA kernels and competition-level…

Machine Learning · Computer Science 2026-01-12 Jiefu Ou , Sapana Chaudhary , Kaj Bostrom , Nathaniel Weir , Shuai Zhang , Huzefa Rangwala , George Karypis

Self-Edit: Fault-Aware Code Editor for Code Generation

Large language models (LLMs) have demonstrated an impressive ability to generate codes on competitive programming tasks. However, with limited sample numbers, LLMs still suffer from poor accuracy. Inspired by the process of human…

Software Engineering · Computer Science 2023-09-12 Kechi Zhang , Zhuo Li , Jia Li , Ge Li , Zhi Jin