English
Related papers

Related papers: DOCE: Finding the Sweet Spot for Execution-Based C…

200 papers

Large language models (LLMs) have shown significant advancements in code generation, but still face challenges on tasks beyond their basic capabilities. Recently, the notion of self-debugging has been proposed to boost the performance of…

Software Engineering · Computer Science 2025-01-23 Xiancai Chen , Zhengwei Tao , Kechi Zhang , Changzhi Zhou , Wanli Gu , Yuanpeng He , Mengdi Zhang , Xunliang Cai , Haiyan Zhao , Zhi Jin

Large Language Models (LLMs), such as GPT-4, StarCoder, and CodeLlama, are transforming the way developers approach programming by automatically generating code based on given natural language descriptions. Despite advancements, generating…

Software Engineering · Computer Science 2024-09-20 Zhihong Sun , Yao Wan , Jia Li , Hongyu Zhang , Zhi Jin , Ge Li , Chen Lyu

A promising research direction in enabling LLMs to generate consistently correct code involves addressing their inability to properly estimate program execution, particularly for code they generate. In this work, we demonstrate that Code…

Computation and Language · Computer Science 2026-04-07 Gallil Maimon , Ori Yoran , Felix Kreuk , Michael Hassid , Gal Cohen , Pierre Chambon , Yossi Adi

A proper code evaluation metric (CEM) profoundly impacts the evolution of code generation, which is an important research field in NLP and software engineering. Prevailing match-based CEMs (e.g., BLEU, Accuracy, and CodeBLEU) suffer from…

Software Engineering · Computer Science 2024-09-06 Yihong Dong , Jiazheng Ding , Xue Jiang , Ge Li , Zhuo Li , Zhi Jin

To adequately test modern code generation systems, evaluation benchmarks must execute and test the code generated by the system. However, these execution and testing requirements have largely limited benchmarks to settings where code is…

Software Engineering · Computer Science 2024-10-04 Yiqing Xie , Alex Xie , Divyanshu Sheth , Pengfei Liu , Daniel Fried , Carolyn Rose

Generative models of code, pretrained on large corpora of programs, have shown great success in translating natural language to code (Chen et al., 2021; Austin et al., 2021; Li et al., 2022, inter alia). While these models do not explicitly…

Computation and Language · Computer Science 2022-11-02 Freda Shi , Daniel Fried , Marjan Ghazvininejad , Luke Zettlemoyer , Sida I. Wang

Decompilation -- recovering source code from compiled binaries -- is essential for security analysis, malware reverse engineering, and legacy software maintenance. However, existing decompilers produce code that often fails to compile or…

Software Engineering · Computer Science 2026-05-05 Yifan Zhang , Xiaohan Wang , Yueke Zhang , Yu Huang , Kevin Leach

The advent of large language models trained on code (code LLMs) has led to significant progress in language-to-code generation. State-of-the-art approaches in this area combine LLM decoding with sample pruning and reranking using test cases…

Machine Learning · Computer Science 2023-09-04 Ansong Ni , Srini Iyer , Dragomir Radev , Ves Stoyanov , Wen-tau Yih , Sida I. Wang , Xi Victoria Lin

The use of large language models (LLMs) for automated code generation has emerged as a significant focus within AI research. As these pretrained models continue to evolve, their ability to understand and generate complex code structures has…

Software Engineering · Computer Science 2025-05-06 Nazmus Ashrafi , Salah Bouktif , Mohammed Mediani

In the domain of code generation, self-debugging is crucial. It allows LLMs to refine their generated code based on execution feedback. This is particularly important because generating correct solutions in one attempt proves challenging…

Computation and Language · Computer Science 2025-02-17 Nan Jiang , Xiaopeng Li , Shiqi Wang , Qiang Zhou , Soneya Binta Hossain , Baishakhi Ray , Varun Kumar , Xiaofei Ma , Anoop Deoras

Code Executing Reasoning is becoming a new non-functional metric that assesses the ability of large language models (LLMs) in programming tasks. State-of-the-art frameworks (CodeMind or REval) and benchmarks (CruxEval) usually focus on…

Software Engineering · Computer Science 2025-01-31 Changshu Liu , Reyhaneh Jabbarvand

The rise of large language models (LLMs) has introduced transformative potential in automated code generation, addressing a wide range of software engineering challenges. However, empirical evaluation of LLM-based code generation lacks…

Software Engineering · Computer Science 2025-10-07 Nathalia Nascimento , Everton Guimaraes , Paulo Alencar

Large language models (LLMs) are leading significant progress in code generation. Beyond one-pass code generation, recent works further integrate unit tests and program verifiers into LLMs to iteratively refine the generated programs.…

Software Engineering · Computer Science 2024-06-12 Li Zhong , Zilong Wang , Jingbo Shang

We present a novel approach to neural code generation that incorporates real-time execution signals into the language model generation process. While large language models (LLMs) have demonstrated impressive code generation capabilities,…

Machine Learning · Computer Science 2025-10-24 Boaz Lavon , Shahar Katz , Lior Wolf

When writing programs, people have the ability to tackle a new complex task by decomposing it into smaller and more familiar subtasks. While it is difficult to measure whether neural program synthesis methods have similar capabilities, we…

Machine Learning · Computer Science 2024-05-07 Kensen Shi , Joey Hong , Yinlin Deng , Pengcheng Yin , Manzil Zaheer , Charles Sutton

Large Language Models (LLMs) demonstrate strong capabilities in general coding tasks but encounter two key challenges when optimizing code: (i) the complexity of writing optimized code (such as performant CUDA kernels and competition-level…

Machine Learning · Computer Science 2026-01-12 Jiefu Ou , Sapana Chaudhary , Kaj Bostrom , Nathaniel Weir , Shuai Zhang , Huzefa Rangwala , George Karypis

Binary decompilation plays an important role in software security analysis, reverse engineering, and malware understanding when source code is unavailable. However, existing decompilation techniques often fail to produce source code that…

Software Engineering · Computer Science 2026-04-14 Xiaohan Wang , Yuxin Hu , Kevin Leach

Given recent advancements of Large Language Models (LLMs), code generation tasks attract immense attention for wide application in different domains. In an effort to evaluate and select a best model to automatically remediate system…

Computation and Language · Computer Science 2024-12-18 Ngoc Phuoc An Vo , Brent Paulovicks , Vadim Sheinin

Although large language models (LLMs) have been largely successful in generating functionally correct programs, conditioning models to produce efficient solutions while ensuring correctness remains a challenge. Further, unreliability in…

Computation and Language · Computer Science 2024-10-11 Siddhant Waghjale , Vishruth Veerendranath , Zora Zhiruo Wang , Daniel Fried

This work addresses test output prediction, a key challenge in test case generation. To improve the reliability of predicted outputs by LLMs, prior approaches generate code first to ground predictions. One grounding strategy is direct…

Software Engineering · Computer Science 2026-04-14 Hojae Han , Jaejin Kim , Seung-won Hwang , Yu Jin Kim , Moontae Lee
‹ Prev 1 2 3 10 Next ›