Related papers: Execution-Based Evaluation for Open-Domain Code Ge…

Incorporating External Knowledge through Pre-training for Natural Language to Code Generation

Open-domain code generation aims to generate code in a general-purpose programming language (such as Python) from natural language (NL) intents. Motivated by the intuition that developers usually retrieve resources on the web when writing…

Computation and Language · Computer Science 2020-04-21 Frank F. Xu , Zhengbao Jiang , Pengcheng Yin , Bogdan Vasilescu , Graham Neubig

Execution-based Evaluation for Data Science Code Generation Models

Code generation models can benefit data scientists' productivity by automatically generating code from context and text descriptions. An important measure of the modeling progress is whether a model can generate code that can correctly…

Software Engineering · Computer Science 2022-11-18 Junjie Huang , Chenglong Wang , Jipeng Zhang , Cong Yan , Haotian Cui , Jeevana Priya Inala , Colin Clement , Nan Duan , Jianfeng Gao

CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X

Large pre-trained code generation models, such as OpenAI Codex, can generate syntax- and function-correct code, making the coding of programmers more productive and our pursuit of artificial general intelligence closer. In this paper, we…

Machine Learning · Computer Science 2024-07-11 Qinkai Zheng , Xiao Xia , Xu Zou , Yuxiao Dong , Shan Wang , Yufei Xue , Zihan Wang , Lei Shen , Andi Wang , Yang Li , Teng Su , Zhilin Yang , Jie Tang

CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks

To adequately test modern code generation systems, evaluation benchmarks must execute and test the code generated by the system. However, these execution and testing requirements have largely limited benchmarks to settings where code is…

Software Engineering · Computer Science 2024-10-04 Yiqing Xie , Alex Xie , Divyanshu Sheth , Pengfei Liu , Daniel Fried , Carolyn Rose

MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages

While there has been a recent burgeoning of applications at the intersection of natural and programming languages, such as code generation and code summarization, these applications are usually English-centric. This creates a barrier for…

Computation and Language · Computer Science 2023-02-08 Zhiruo Wang , Grace Cuenca , Shuyan Zhou , Frank F. Xu , Graham Neubig

DocCGen: Document-based Controlled Code Generation

Recent developments show that Large Language Models (LLMs) produce state-of-the-art performance on natural language (NL) to code generation for resource-rich general-purpose languages like C++, Java, and Python. However, their practical…

Software Engineering · Computer Science 2024-07-04 Sameer Pimparkhede , Mehant Kammakomati , Srikanth Tamilselvam , Prince Kumar , Ashok Pon Kumar , Pushpak Bhattacharyya

xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval

Recently, pre-trained large language models (LLMs) have shown impressive abilities in generating codes from natural language descriptions, repairing buggy codes, translating codes between languages, and retrieving relevant code segments.…

Computation and Language · Computer Science 2023-11-07 Mohammad Abdullah Matin Khan , M Saiful Bari , Xuan Long Do , Weishi Wang , Md Rizwan Parvez , Shafiq Joty

On Codex Prompt Engineering for OCL Generation: An Empirical Study

The Object Constraint Language (OCL) is a declarative language that adds constraints and object query expressions to MOF models. Despite its potential to provide precision and conciseness to UML models, the unfamiliar syntax of OCL has…

Software Engineering · Computer Science 2023-03-30 Seif Abukhalaf , Mohammad Hamdaqa , Foutse Khomh

Natural Language to Code Translation with Execution

Generative models of code, pretrained on large corpora of programs, have shown great success in translating natural language to code (Chen et al., 2021; Austin et al., 2021; Li et al., 2022, inter alia). While these models do not explicitly…

Computation and Language · Computer Science 2022-11-02 Freda Shi , Daniel Fried , Marjan Ghazvininejad , Luke Zettlemoyer , Sida I. Wang

Execution-Based Evaluation of Natural Language to Bash and PowerShell for Incident Remediation

Given recent advancements of Large Language Models (LLMs), code generation tasks attract immense attention for wide application in different domains. In an effort to evaluate and select a best model to automatically remediate system…

Computation and Language · Computer Science 2024-12-18 Ngoc Phuoc An Vo , Brent Paulovicks , Vadim Sheinin

HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization

Large language models (LLMs) have made significant progress in generating codes from textual prompts. However, existing benchmarks have mainly concentrated on translating English prompts to multilingual codes or have been constrained to…

Computation and Language · Computer Science 2024-03-26 Qiwei Peng , Yekun Chai , Xuhong Li

OpenClassGen: A Large-Scale Corpus of Real-World Python Classes for LLM Research

Existing class-level code generation datasets are either synthetic (ClassEval: 100 classes) or insufficient in scale for modern training needs (RealClassEval: 400 classes), hindering robust evaluation and empirical analysis. We present…

Software Engineering · Computer Science 2026-05-01 Musfiqur Rahman , SayedHassan Khatoonabadi , Emad Shihab

Learning to Reason via Program Generation, Emulation, and Search

Program synthesis with language models (LMs) has unlocked a large set of reasoning abilities; code-tuned LMs have proven adept at generating programs that solve a wide variety of algorithmic symbolic manipulation tasks (e.g. word…

Computation and Language · Computer Science 2024-11-05 Nathaniel Weir , Muhammad Khalifa , Linlu Qiu , Orion Weller , Peter Clark

GenX: Mastering Code and Test Generation with Execution Feedback

Recent advancements in language modeling have enabled the translation of natural language into code, and the use of execution feedback to improve code generation. However, these methods often rely heavily on pre-existing test cases, which…

Software Engineering · Computer Science 2024-12-19 Nan Wang , Yafei Liu , Chen Chen , Haonan Lu

Natural Language to Code Generation in Interactive Data Science Notebooks

Computational notebooks, such as Jupyter notebooks, are interactive computing environments that are ubiquitous among data scientists to perform data wrangling and analytic tasks. To measure the performance of AI pair programmers that…

Computation and Language · Computer Science 2022-12-20 Pengcheng Yin , Wen-Ding Li , Kefan Xiao , Abhishek Rao , Yeming Wen , Kensen Shi , Joshua Howland , Paige Bailey , Michele Catasta , Henryk Michalewski , Alex Polozov , Charles Sutton

LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations

Large Language Models (LLMs) like Codex are powerful tools for performing code completion and code generation tasks as they are trained on billions of lines of code from publicly available sources. Moreover, these models are capable of…

Software Engineering · Computer Science 2023-03-17 Catherine Tony , Markus Mutas , Nicolás E. Díaz Ferreyra , Riccardo Scandariato

A Systematic Evaluation of Large Language Models of Code

Large language models (LMs) of code have recently shown tremendous promise in completing code and synthesizing code from natural language descriptions. However, the current state-of-the-art code LMs (e.g., Codex (Chen et al., 2021)) are not…

Programming Languages · Computer Science 2022-05-05 Frank F. Xu , Uri Alon , Graham Neubig , Vincent J. Hellendoorn

MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation

Large language models have demonstrated the ability to generate both natural language and programming language text. Such models open up the possibility of multi-language code generation: could code generation models generalize knowledge…

Machine Learning · Computer Science 2022-12-20 Federico Cassano , John Gouwar , Daniel Nguyen , Sydney Nguyen , Luna Phipps-Costin , Donald Pinckney , Ming-Ho Yee , Yangtian Zi , Carolyn Jane Anderson , Molly Q Feldman , Arjun Guha , Michael Greenberg , Abhinav Jangda

CodeSpecBench: Benchmarking LLMs for Executable Behavioral Specification Generation

Large language models (LLMs) can generate code from natural language, but the extent to which they capture intended program behavior remains unclear. Executable behavioral specifications, defined via preconditions and postconditions,…

Software Engineering · Computer Science 2026-04-15 Zaoyu Chen , Jianbo Dai , Boyu Zhu , Jingdong Wang , Huiming Wang , Xin Xu , Haoyang Yuan , Zhijiang Guo , Xiao-Ming Wu

CoderEval: A Benchmark of Pragmatic Code Generation with Generative Pre-trained Models

Code generation models based on the pre-training and fine-tuning paradigm have been increasingly attempted by both academia and industry, resulting in well-known industrial models such as Codex, CodeGen, and PanGu-Coder. To evaluate the…

Software Engineering · Computer Science 2024-02-26 Hao Yu , Bo Shen , Dezhi Ran , Jiaxin Zhang , Qi Zhang , Yuchi Ma , Guangtai Liang , Ying Li , Qianxiang Wang , Tao Xie