Related papers: Program Synthesis with Large Language Models

Dr. Boot: Bootstrapping Program Synthesis Language Models to Perform Repairing

Language models for program synthesis are usually trained and evaluated on programming competition datasets (MBPP, APPS). However, these datasets are limited in size and quality, while these language models are extremely data hungry.…

Software Engineering · Computer Science 2025-07-23 Noah van der Vleuten

MHPP: Exploring the Capabilities and Limitations of Language Models Beyond Basic Code Generation

Recent advancements in large language models (LLMs) have greatly improved code generation, specifically at the function level. For instance, GPT-4o has achieved a 91.0\% pass rate on HumanEval. However, this draws into question the adequacy…

Computation and Language · Computer Science 2025-08-19 Jianbo Dai , Jianqiao Lu , Yunlong Feng , Guangtao Zeng , Rongju Ruan , Ming Cheng , Dong Huang , Haochen Tan , Zhijiang Guo

PythonSaga: Redefining the Benchmark to Evaluate Code Generating LLMs

Driven by the surge in code generation using large language models (LLMs), numerous benchmarks have emerged to evaluate these LLMs capabilities. We conducted a large-scale human evaluation of HumanEval and MBPP, two popular benchmarks for…

Computation and Language · Computer Science 2024-07-08 Ankit Yadav , Himanshu Beniwal , Mayank Singh

Training Language Models on Synthetic Edit Sequences Improves Code Synthesis

Software engineers mainly write code by editing existing programs. In contrast, language models (LMs) autoregressively synthesize programs in a single pass. One explanation for this is the scarcity of sequential edit data. While…

Machine Learning · Computer Science 2025-02-12 Ulyana Piterbarg , Lerrel Pinto , Rob Fergus

Jigsaw: Large Language Models meet Program Synthesis

Large pre-trained language models such as GPT-3, Codex, and Google's language model are now capable of generating code from natural language specifications of programmer intent. We view these developments with a mixture of optimism and…

Software Engineering · Computer Science 2021-12-07 Naman Jain , Skanda Vaidyanath , Arun Iyer , Nagarajan Natarajan , Suresh Parthasarathy , Sriram Rajamani , Rahul Sharma

Measuring Coding Challenge Competence With APPS

While programming is one of the most broadly applicable skills in modern society, modern machine learning models still cannot code solutions to basic problems. Despite its importance, there has been surprisingly little work on evaluating…

Software Engineering · Computer Science 2021-11-10 Dan Hendrycks , Steven Basart , Saurav Kadavath , Mantas Mazeika , Akul Arora , Ethan Guo , Collin Burns , Samir Puranik , Horace He , Dawn Song , Jacob Steinhardt

Function-constrained Program Synthesis

This work introduces (1) a technique that allows large language models (LLMs) to leverage user-provided code when solving programming tasks and (2) a method to iteratively generate modular sub-functions that can aid future code generation…

Machine Learning · Computer Science 2023-12-05 Patrick Hajali , Ignas Budvytis

CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis

Program synthesis strives to generate a computer program as a solution to a given problem specification, expressed with input-output examples or natural language descriptions. The prevalence of large language models advances the…

Machine Learning · Computer Science 2023-03-01 Erik Nijkamp , Bo Pang , Hiroaki Hayashi , Lifu Tu , Huan Wang , Yingbo Zhou , Silvio Savarese , Caiming Xiong

Multi-lingual Evaluation of Code Generation Models

We present new benchmarks on evaluation code generation models: MBXP and Multilingual HumanEval, and MathQA-X. These datasets cover over 10 programming languages and are generated using a scalable conversion framework that transpiles…

Machine Learning · Computer Science 2023-03-30 Ben Athiwaratkun , Sanjay Krishna Gouda , Zijian Wang , Xiaopeng Li , Yuchen Tian , Ming Tan , Wasi Uddin Ahmad , Shiqi Wang , Qing Sun , Mingyue Shang , Sujan Kumar Gonugondla , Hantian Ding , Varun Kumar , Nathan Fulton , Arash Farahani , Siddhartha Jain , Robert Giaquinto , Haifeng Qian , Murali Krishna Ramanathan , Ramesh Nallapati , Baishakhi Ray , Parminder Bhatia , Sudipta Sengupta , Dan Roth , Bing Xiang

Increasing LLM Coding Capabilities through Diverse Synthetic Coding Tasks

Large language models (LLMs) have shown impressive promise in code generation, yet their progress remains limited by the shortage of large-scale datasets that are both diverse and well-aligned with human reasoning. Most existing resources…

Machine Learning · Computer Science 2025-10-28 Amal Abed , Ivan Lukic , Jörg K. H. Franke , Frank Hutter

Synthetic Data Generation Using Large Language Models: Advances in Text and Code

This survey reviews how large language models (LLMs) are transforming synthetic training data generation in both natural language and code domains. By producing artificial but task-relevant examples, these models can significantly augment…

Computation and Language · Computer Science 2025-11-21 Mihai Nadas , Laura Diosan , Andreea Tomescu

Program Synthesis Benchmark for Visual Programming in XLogoOnline Environment

Large language and multimodal models have shown remarkable success on various benchmarks focused on specific skills such as general-purpose programming, math word problem-solving, and visual question answering. However, it is unclear how…

Artificial Intelligence · Computer Science 2025-10-07 Chao Wen , Jacqueline Staub , Adish Singla

Towards Neural Synthesis for SMT-Assisted Proof-Oriented Programming

Proof-oriented programs mix computational content with proofs of program correctness. However, the human effort involved in programming and proving is still substantial, despite the use of Satisfiability Modulo Theories (SMT) solvers to…

Programming Languages · Computer Science 2024-09-06 Saikat Chakraborty , Gabriel Ebner , Siddharth Bhat , Sarah Fakhoury , Sakina Fatima , Shuvendu Lahiri , Nikhil Swamy

Towards AI-Assisted Synthesis of Verified Dafny Methods

Large language models show great promise in many domains, including programming. A promise is easy to make but hard to keep, and language models often fail to keep their promises, generating erroneous code. A promising avenue to keep models…

Software Engineering · Computer Science 2024-06-12 Md Rakib Hossain Misu , Cristina V. Lopes , Iris Ma , James Noble

LLM4TDD: Best Practices for Test Driven Development Using Large Language Models

In today's society, we are becoming increasingly dependent on software systems. However, we also constantly witness the negative impacts of buggy software. Program synthesis aims to improve software correctness by automatically generating…

Software Engineering · Computer Science 2023-12-11 Sanyogita Piya , Allison Sullivan

Large Language Models Synergize with Automated Machine Learning

Recently, program synthesis driven by large language models (LLMs) has become increasingly popular. However, program synthesis for machine learning (ML) tasks still poses significant challenges. This paper explores a novel form of program…

Software Engineering · Computer Science 2024-09-10 Jinglue Xu , Jialong Li , Zhen Liu , Nagar Anthel Venkatesh Suryanarayanan , Guoyuan Zhou , Jia Guo , Hitoshi Iba , Kenji Tei

A Critical Review of Large Language Model on Software Engineering: An Example from ChatGPT and Automated Program Repair

Large Language Models (LLMs) have been gaining increasing attention and demonstrated promising performance across a variety of Software Engineering (SE) tasks, such as Automated Program Repair (APR), code summarization, and code completion.…

Software Engineering · Computer Science 2024-04-18 Quanjun Zhang , Tongke Zhang , Juan Zhai , Chunrong Fang , Bowen Yu , Weisong Sun , Zhenyu Chen

LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms

Large Language Models are traditionally finetuned on large instruction datasets. However recent studies suggest that small, high-quality datasets can suffice for general purpose instruction following. This lack of consensus surrounding…

Machine Learning · Computer Science 2023-12-29 Aditi Jha , Sam Havens , Jeremy Dohmann , Alex Trott , Jacob Portes

Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation

Program synthesis has been long studied with recent approaches focused on directly using the power of Large Language Models (LLMs) to generate code. Programming benchmarks, with curated synthesis problems and test-cases, are used to measure…

Software Engineering · Computer Science 2023-11-01 Jiawei Liu , Chunqiu Steven Xia , Yuyao Wang , Lingming Zhang

The Program Testing Ability of Large Language Models for Code

Recent development of large language models (LLMs) for code like CodeX and CodeT5+ demonstrates tremendous promise in achieving code intelligence. Their ability of synthesizing code that completes a program for performing a pre-defined task…

Computation and Language · Computer Science 2023-10-10 Weimin Xiong , Yiwen Guo , Hao Chen