English
Related papers

Related papers: Execution-based Evaluation for Data Science Code G…

200 papers

To adequately test modern code generation systems, evaluation benchmarks must execute and test the code generated by the system. However, these execution and testing requirements have largely limited benchmarks to settings where code is…

Software Engineering · Computer Science 2024-10-04 Yiqing Xie , Alex Xie , Divyanshu Sheth , Pengfei Liu , Daniel Fried , Carolyn Rose

Code execution is a fundamental aspect of programming language semantics that reflects the exact behavior of the code. However, most pre-trained models for code intelligence ignore the execution trace and only rely on source code and…

Programming Languages · Computer Science 2023-05-10 Chenxiao Liu , Shuai Lu , Weizhu Chen , Daxin Jiang , Alexey Svyatkovskiy , Shengyu Fu , Neel Sundaresan , Nan Duan

Developing models that can automatically generate detailed code explanation can greatly benefit software maintenance and programming education. However, existing code-to-text generation models often produce only high-level summaries of code…

Computation and Language · Computer Science 2022-11-29 Haotian Cui , Chenglong Wang , Junjie Huang , Jeevana Priya Inala , Todd Mytkowicz , Bo Wang , Jianfeng Gao , Nan Duan

Recently, a diverse set of decoding and reranking procedures have been shown effective for LLM-based code generation. However, a comprehensive framework that links and experimentally compares these methods is missing. We address this by…

Computation and Language · Computer Science 2024-10-17 Haau-Sing Li , Patrick Fernandes , Iryna Gurevych , André F. T. Martins

To extend the scope of coding queries to more realistic settings, we propose ODEX, the first Open-Domain EXecution-based natural language (NL) to Python code generation dataset. ODEX has 945 NL-Code pairs spanning 79 diverse libraries,…

Software Engineering · Computer Science 2023-05-22 Zhiruo Wang , Shuyan Zhou , Daniel Fried , Graham Neubig

Benchmark datasets have a significant impact on accelerating research in programming language tasks. In this paper, we introduce CodeXGLUE, a benchmark dataset to foster machine learning research for program understanding and generation.…

Large language models (LLMs) can generate code from natural language, but the extent to which they capture intended program behavior remains unclear. Executable behavioral specifications, defined via preconditions and postconditions,…

Software Engineering · Computer Science 2026-04-15 Zaoyu Chen , Jianbo Dai , Boyu Zhu , Jingdong Wang , Huiming Wang , Xin Xu , Haoyang Yuan , Zhijiang Guo , Xiao-Ming Wu

Reimplementing solutions to previously solved software engineering problems is not only inefficient but also introduces inadequate and error-prone code. Many existing methods achieve impressive performance on this issue by using…

Software Engineering · Computer Science 2022-10-04 Usama Nadeem , Noah Ziems , Shaoen Wu

Recent advancements in language modeling have enabled the translation of natural language into code, and the use of execution feedback to improve code generation. However, these methods often rely heavily on pre-existing test cases, which…

Software Engineering · Computer Science 2024-12-19 Nan Wang , Yafei Liu , Chen Chen , Haonan Lu

Code generation models based on the pre-training and fine-tuning paradigm have been increasingly attempted by both academia and industry, resulting in well-known industrial models such as Codex, CodeGen, and PanGu-Coder. To evaluate the…

Software Engineering · Computer Science 2024-02-26 Hao Yu , Bo Shen , Dezhi Ran , Jiaxin Zhang , Qi Zhang , Yuchi Ma , Guangtai Liang , Ying Li , Qianxiang Wang , Tao Xie

The task of generating code solutions for a given programming problem can benefit from the use of pre-trained language models such as Codex, which can produce multiple diverse samples. However, a major challenge for this task is to select…

Computation and Language · Computer Science 2022-11-24 Bei Chen , Fengji Zhang , Anh Nguyen , Daoguang Zan , Zeqi Lin , Jian-Guang Lou , Weizhu Chen

A proper code evaluation metric (CEM) profoundly impacts the evolution of code generation, which is an important research field in NLP and software engineering. Prevailing match-based CEMs (e.g., BLEU, Accuracy, and CodeBLEU) suffer from…

Software Engineering · Computer Science 2024-09-06 Yihong Dong , Jiazheng Ding , Xue Jiang , Ge Li , Zhuo Li , Zhi Jin

Given recent advancements of Large Language Models (LLMs), code generation tasks attract immense attention for wide application in different domains. In an effort to evaluate and select a best model to automatically remediate system…

Computation and Language · Computer Science 2024-12-18 Ngoc Phuoc An Vo , Brent Paulovicks , Vadim Sheinin

A promising research direction in enabling LLMs to generate consistently correct code involves addressing their inability to properly estimate program execution, particularly for code they generate. In this work, we demonstrate that Code…

Computation and Language · Computer Science 2026-04-07 Gallil Maimon , Ori Yoran , Felix Kreuk , Michael Hassid , Gal Cohen , Pierre Chambon , Yossi Adi

The complexity of modern software has led to a drastic increase in the time and cost associated with detecting and rectifying software bugs. In response, researchers have explored various methods to automatically generate fixes for buggy…

Software Engineering · Computer Science 2023-03-31 Md Mahim Anjum Haque , Wasi Uddin Ahmad , Ismini Lourentzou , Chris Brown

Diffusion models have achieved remarkable performance in generative modeling, yet their theoretical foundations are often intricate, and the gap between mathematical formulations in papers and practical open-source implementations can be…

Computer Vision and Pattern Recognition · Computer Science 2025-12-09 Cheng Yu

We consider the problem of neural semantic parsing, which translates natural language questions into executable SQL queries. We introduce a new mechanism, execution guidance, to leverage the semantics of SQL. It detects and excludes faulty…

Computation and Language · Computer Science 2018-09-17 Chenglong Wang , Kedar Tatwawadi , Marc Brockschmidt , Po-Sen Huang , Yi Mao , Oleksandr Polozov , Rishabh Singh

Program synthesis with language models (LMs) has unlocked a large set of reasoning abilities; code-tuned LMs have proven adept at generating programs that solve a wide variety of algorithmic symbolic manipulation tasks (e.g. word…

Computation and Language · Computer Science 2024-11-05 Nathaniel Weir , Muhammad Khalifa , Linlu Qiu , Orion Weller , Peter Clark

Most existing pre-trained language models for source code focus on learning the static code text, typically augmented with static code structures (abstract syntax tree, dependency graphs, etc.). However, program semantics will not be fully…

Software Engineering · Computer Science 2023-06-14 Yangruibo Ding , Ben Steenhoek , Kexin Pei , Gail Kaiser , Wei Le , Baishakhi Ray

Existing datasets for coding agents evaluate performance on isolated, single pull request (PR) tasks in a stateless manner, failing to capture the reality of real-world software development where code changes accumulate, technical debt…

Software Engineering · Computer Science 2026-04-06 KN Ajay Shastry , Ganesh Senrayan , Shrey Satapara , Pranoy Panda , Chaitanya Devaguptapu
‹ Prev 1 2 3 10 Next ›