Related papers: OpenCodeInterpreter: Integrating Code Generation w…

DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

The rapid development of large language models has revolutionized code intelligence in software development. However, the predominance of closed-source models has restricted extensive research and development. To address this, we introduce…

Software Engineering · Computer Science 2024-01-29 Daya Guo , Qihao Zhu , Dejian Yang , Zhenda Xie , Kai Dong , Wentao Zhang , Guanting Chen , Xiao Bi , Y. Wu , Y. K. Li , Fuli Luo , Yingfei Xiong , Wenfeng Liang

ReCode: Robustness Evaluation of Code Generation Models

Code generation models have achieved impressive performance. However, they tend to be brittle as slight edits to a prompt could lead to very different generations; these robustness properties, critical for user experience when deployed in…

Machine Learning · Computer Science 2022-12-21 Shiqi Wang , Zheng Li , Haifeng Qian , Chenghao Yang , Zijian Wang , Mingyue Shang , Varun Kumar , Samson Tan , Baishakhi Ray , Parminder Bhatia , Ramesh Nallapati , Murali Krishna Ramanathan , Dan Roth , Bing Xiang

Design and Implementation of Code Completion System Based on LLM and CodeBERT Hybrid Subsystem

In the rapidly evolving industry of software development, coding efficiency and accuracy play significant roles in delivering high-quality software. Various code suggestion and completion tools, such as CodeBERT from Microsoft and GPT-3.5…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-11 Bingbing Zhang , Ziyu Lin , Yingxin Su

MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning

The recently released GPT-4 Code Interpreter has demonstrated remarkable proficiency in solving challenging math problems, primarily attributed to its ability to seamlessly reason with natural language, generate code, execute code, and…

Computation and Language · Computer Science 2023-10-06 Ke Wang , Houxing Ren , Aojun Zhou , Zimu Lu , Sichun Luo , Weikang Shi , Renrui Zhang , Linqi Song , Mingjie Zhan , Hongsheng Li

AutoCoder: Enhancing Code Large Language Model with \textsc{AIEV-Instruct}

We introduce AutoCoder, the first Large Language Model to surpass GPT-4 Turbo (April 2024) and GPT-4o in pass@1 on the Human Eval benchmark test ($\mathbf{90.9\%}$ vs. $\mathbf{90.2\%}$). In addition, AutoCoder offers a more versatile code…

Software Engineering · Computer Science 2024-05-27 Bin Lei , Yuchen Li , Qiuwu Chen

Large Language Models as Code Executors: An Exploratory Study

The capabilities of Large Language Models (LLMs) have significantly evolved, extending from natural language processing to complex tasks like code understanding and generation. We expand the scope of LLMs' capabilities to a broader context,…

Computation and Language · Computer Science 2024-10-11 Chenyang Lyu , Lecheng Yan , Rui Xing , Wenxi Li , Younes Samih , Tianbo Ji , Longyue Wang

Generating High-Quality Datasets for Code Editing via Open-Source Language Models

Code editing plays a vital role in software engineering, requiring developers to adjust existing code according to natural language instructions while keeping functionality intact and avoiding unnecessary modifications. However,…

Software Engineering · Computer Science 2025-10-08 Zekai Zhang , Mingwei Liu , Zhenxi Chen , Linxi Liang , Yuxuan Chen , Guangsheng Ou , Yanlin Wang , Dan Li , Xin Peng , Zibin Zheng

Evaluating Large Language Models Trained on Code

We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we…

Machine Learning · Computer Science 2021-07-15 Mark Chen , Jerry Tworek , Heewoo Jun , Qiming Yuan , Henrique Ponde de Oliveira Pinto , Jared Kaplan , Harri Edwards , Yuri Burda , Nicholas Joseph , Greg Brockman , Alex Ray , Raul Puri , Gretchen Krueger , Michael Petrov , Heidy Khlaaf , Girish Sastry , Pamela Mishkin , Brooke Chan , Scott Gray , Nick Ryder , Mikhail Pavlov , Alethea Power , Lukasz Kaiser , Mohammad Bavarian , Clemens Winter , Philippe Tillet , Felipe Petroski Such , Dave Cummings , Matthias Plappert , Fotios Chantzis , Elizabeth Barnes , Ariel Herbert-Voss , William Hebgen Guss , Alex Nichol , Alex Paino , Nikolas Tezak , Jie Tang , Igor Babuschkin , Suchir Balaji , Shantanu Jain , William Saunders , Christopher Hesse , Andrew N. Carr , Jan Leike , Josh Achiam , Vedant Misra , Evan Morikawa , Alec Radford , Matthew Knight , Miles Brundage , Mira Murati , Katie Mayer , Peter Welinder , Bob McGrew , Dario Amodei , Sam McCandlish , Ilya Sutskever , Wojciech Zaremba

OctoPack: Instruction Tuning Code Large Language Models

Finetuning large language models (LLMs) on instructions leads to vast performance improvements on natural language tasks. We apply instruction tuning using code, leveraging the natural structure of Git commits, which pair code changes with…

Computation and Language · Computer Science 2024-02-20 Niklas Muennighoff , Qian Liu , Armel Zebaze , Qinkai Zheng , Binyuan Hui , Terry Yue Zhuo , Swayam Singh , Xiangru Tang , Leandro von Werra , Shayne Longpre

HumanEval on Latest GPT Models -- 2024

In 2023, we are using the latest models of GPT-4 to advance program synthesis. The large language models have significantly improved the state-of-the-art for this purpose. To make these advancements more accessible, we have created a…

Computation and Language · Computer Science 2024-02-26 Daniel Li , Lincoln Murr

HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation

We introduce self-invoking code generation, a new task designed to evaluate the progressive reasoning and problem-solving capabilities of LLMs. In this task, models are presented with a base problem and a related, more complex problem. They…

Software Engineering · Computer Science 2025-01-03 Zhaojian Yu , Yilun Zhao , Arman Cohan , Xiao-Ping Zhang

AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation

The advancement of natural language processing (NLP) has been significantly boosted by the development of transformer-based large language models (LLMs). These models have revolutionized NLP tasks, particularly in code generation, aiding…

Computation and Language · Computer Science 2024-05-27 Dong Huang , Jie M. Zhang , Michael Luck , Qingwen Bu , Yuhao Qing , Heming Cui

CodeExp: Explanatory Code Document Generation

Developing models that can automatically generate detailed code explanation can greatly benefit software maintenance and programming education. However, existing code-to-text generation models often produce only high-level summaries of code…

Computation and Language · Computer Science 2022-11-29 Haotian Cui , Chenglong Wang , Junjie Huang , Jeevana Priya Inala , Todd Mytkowicz , Bo Wang , Jianfeng Gao , Nan Duan

Comparing large language models and human programmers for generating programming code

We systematically evaluated the performance of seven large language models in generating programming code using various prompt strategies, programming languages, and task difficulties. GPT-4 substantially outperforms other large language…

Software Engineering · Computer Science 2025-01-22 Wenpin Hou , Zhicheng Ji

Feedback Over Form: Why Execution Feedback Matters More Than Pipeline Topology in 1-3B Code Generation

Small language models (1-3B) are practical to run locally, but individually limited on harder code generation tasks. We ask whether composing them into pipelines can recover some of that lost capability. We study code generation pipelines…

Software Engineering · Computer Science 2026-04-27 Charles Junichi McAndrews

FeedbackEval: A Benchmark for Evaluating Large Language Models in Feedback-Driven Code Repair Tasks

Code repair is a fundamental task in software development, facilitating efficient bug resolution and software maintenance. Although large language models (LLMs) have demonstrated considerable potential in automated code repair, their…

Software Engineering · Computer Science 2026-02-27 Dekun Dai , MingWei Liu , Anji Li , Jialun Cao , Yanlin Wang , Chong Wang , Xin Peng , Zibin Zheng

CodeT: Code Generation with Generated Tests

The task of generating code solutions for a given programming problem can benefit from the use of pre-trained language models such as Codex, which can produce multiple diverse samples. However, a major challenge for this task is to select…

Computation and Language · Computer Science 2022-11-24 Bei Chen , Fengji Zhang , Anh Nguyen , Daoguang Zan , Zeqi Lin , Jian-Guang Lou , Weizhu Chen

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate…

Software Engineering · Computer Science 2024-06-19 DeepSeek-AI , Qihao Zhu , Daya Guo , Zhihong Shao , Dejian Yang , Peiyi Wang , Runxin Xu , Y. Wu , Yukun Li , Huazuo Gao , Shirong Ma , Wangding Zeng , Xiao Bi , Zihui Gu , Hanwei Xu , Damai Dai , Kai Dong , Liyue Zhang , Yishi Piao , Zhibin Gou , Zhenda Xie , Zhewen Hao , Bingxuan Wang , Junxiao Song , Deli Chen , Xin Xie , Kang Guan , Yuxiang You , Aixin Liu , Qiushi Du , Wenjun Gao , Xuan Lu , Qinyu Chen , Yaohui Wang , Chengqi Deng , Jiashi Li , Chenggang Zhao , Chong Ruan , Fuli Luo , Wenfeng Liang

Compilable Neural Code Generation with Compiler Feedback

Automatically generating compilable programs with (or without) natural language descriptions has always been a touchstone problem for computational linguistics and automated software engineering. Existing deep-learning approaches model code…

Computation and Language · Computer Science 2022-03-11 Xin Wang , Yasheng Wang , Yao Wan , Fei Mi , Yitong Li , Pingyi Zhou , Jin Liu , Hao Wu , Xin Jiang , Qun Liu

OpenAi's GPT4 as coding assistant

Lately, Large Language Models have been widely used in code generation. GPT4 is considered the most potent Large Language Model from Openai. In this paper, we examine GPT3.5 and GPT4 as coding assistants. More specifically, we have…

Artificial Intelligence · Computer Science 2023-09-25 Lefteris Moussiades , George Zografos