Related papers: Evaluating Large Language Models Trained on Code

Exploring the Robustness of Large Language Models for Solving Programming Problems

Using large language models (LLMs) for source code has recently gained attention. LLMs, such as Transformer-based models like Codex and ChatGPT, have been shown to be highly capable of solving a wide range of programming problems. However,…

Computation and Language · Computer Science 2023-06-27 Atsushi Shirafuji , Yutaka Watanobe , Takumi Ito , Makoto Morishita , Yuki Nakamura , Yusuke Oda , Jun Suzuki

Codex Hacks HackerRank: Memorization Issues and a Framework for Code Synthesis Evaluation

The Codex model has demonstrated extraordinary competence in synthesizing code from natural language problem descriptions. However, in order to reveal unknown failure modes and hidden biases, such large-scale models must be systematically…

Software Engineering · Computer Science 2022-12-07 Anjan Karmakar , Julian Aron Prenner , Marco D'Ambros , Romain Robbes

Using Large Language Models to Generate JUnit Tests: An Empirical Study

A code generation model generates code by taking a prompt from a code comment, existing code, or a combination of both. Although code generation models (e.g., GitHub Copilot) are increasingly being adopted in practice, it is unclear whether…

Software Engineering · Computer Science 2024-08-29 Mohammed Latif Siddiq , Joanna C. S. Santos , Ridwanul Hasan Tanvir , Noshin Ulfat , Fahmid Al Rifat , Vinicius Carvalho Lopes

A Systematic Evaluation of Large Language Models of Code

Large language models (LMs) of code have recently shown tremendous promise in completing code and synthesizing code from natural language descriptions. However, the current state-of-the-art code LMs (e.g., Codex (Chen et al., 2021)) are not…

Programming Languages · Computer Science 2022-05-05 Frank F. Xu , Uri Alon , Graham Neubig , Vincent J. Hellendoorn

CodeT: Code Generation with Generated Tests

The task of generating code solutions for a given programming problem can benefit from the use of pre-trained language models such as Codex, which can produce multiple diverse samples. However, a major challenge for this task is to select…

Computation and Language · Computer Science 2022-11-24 Bei Chen , Fengji Zhang , Anh Nguyen , Daoguang Zan , Zeqi Lin , Jian-Guang Lou , Weizhu Chen

GitHub Copilot: the perfect Code compLeeter?

This paper aims to evaluate GitHub Copilot's generated code quality based on the LeetCode problem set using a custom automated framework. We evaluate the results of Copilot for 4 programming languages: Java, C++, Python3 and Rust. We aim to…

Software Engineering · Computer Science 2024-06-18 Ilja Siroš , Dave Singelée , Bart Preneel

Automatic Program Repair with OpenAI's Codex: Evaluating QuixBugs

OpenAI's Codex, a GPT-3 like model trained on a large code corpus, has made headlines in and outside of academia. Given a short user-provided description, it is capable of synthesizing code snippets that are syntactically and semantically…

Software Engineering · Computer Science 2021-11-09 Julian Aron Prenner , Romain Robbes

Jigsaw: Large Language Models meet Program Synthesis

Large pre-trained language models such as GPT-3, Codex, and Google's language model are now capable of generating code from natural language specifications of programmer intent. We view these developments with a mixture of optimism and…

Software Engineering · Computer Science 2021-12-07 Naman Jain , Skanda Vaidyanath , Arun Iyer , Nagarajan Natarajan , Suresh Parthasarathy , Sriram Rajamani , Rahul Sharma

Assessing the Security of GitHub Copilot Generated Code -- A Targeted Replication Study

AI-powered code generation models have been developing rapidly, allowing developers to expedite code generation and thus improve their productivity. These models are trained on large corpora of code (primarily sourced from public…

Software Engineering · Computer Science 2023-11-21 Vahid Majdinasab , Michael Joshua Bishop , Shawn Rasheed , Arghavan Moradidakhel , Amjed Tahir , Foutse Khomh

CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X

Large pre-trained code generation models, such as OpenAI Codex, can generate syntax- and function-correct code, making the coding of programmers more productive and our pursuit of artificial general intelligence closer. In this paper, we…

Machine Learning · Computer Science 2024-07-11 Qinkai Zheng , Xiao Xia , Xu Zou , Yuxiao Dong , Shan Wang , Yufei Xue , Zihan Wang , Lei Shen , Andi Wang , Yang Li , Teng Su , Zhilin Yang , Jie Tang

HumanEval on Latest GPT Models -- 2024

In 2023, we are using the latest models of GPT-4 to advance program synthesis. The large language models have significantly improved the state-of-the-art for this purpose. To make these advancements more accessible, we have created a…

Computation and Language · Computer Science 2024-02-26 Daniel Li , Lincoln Murr

Choose Your Programming Copilot: A Comparison of the Program Synthesis Performance of GitHub Copilot and Genetic Programming

GitHub Copilot, an extension for the Visual Studio Code development environment powered by the large-scale language model Codex, makes automatic program synthesis available for software developers. This model has been extensively studied in…

Software Engineering · Computer Science 2021-11-16 Dominik Sobania , Martin Briesch , Franz Rothlauf

The Program Testing Ability of Large Language Models for Code

Recent development of large language models (LLMs) for code like CodeX and CodeT5+ demonstrates tremendous promise in achieving code intelligence. Their ability of synthesizing code that completes a program for performing a pre-defined task…

Computation and Language · Computer Science 2023-10-10 Weimin Xiong , Yiwen Guo , Hao Chen

CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models

With the emergence of Large Language Models (LLMs), there has been a significant improvement in the programming capabilities of models, attracting growing attention from researchers. Evaluating the programming capabilities of LLMs is…

Computation and Language · Computer Science 2024-03-12 Lingyue Fu , Huacan Chai , Shuang Luo , Kounianhua Du , Weiming Zhang , Longteng Fan , Jiayi Lei , Renting Rui , Jianghao Lin , Yuchen Fang , Yifan Liu , Jingkuan Wang , Siyuan Qi , Kangning Zhang , Weinan Zhang , Yong Yu

Large Language Models for Code Generation: A Comprehensive Survey of Challenges, Techniques, Evaluation, and Applications

Large Language Models (LLMs) have demonstrated their remarkable capabilities in numerous fields. This survey focuses on how LLMs empower users, regardless of their technical background, to use human languages to automatically generate…

Software Engineering · Computer Science 2025-04-03 Nam Huynh , Beiyu Lin

On the Robustness of Code Generation Techniques: An Empirical Study on GitHub Copilot

Software engineering research has always being concerned with the improvement of code completion approaches, which suggest the next tokens a developer will likely type while coding. The release of GitHub Copilot constitutes a big step…

Software Engineering · Computer Science 2023-02-02 Antonio Mastropaolo , Luca Pascarella , Emanuela Guglielmi , Matteo Ciniselli , Simone Scalabrino , Rocco Oliveto , Gabriele Bavota

Security Weaknesses of Copilot-Generated Code in GitHub Projects: An Empirical Study

Modern code generation tools utilizing AI models like Large Language Models (LLMs) have gained increased popularity due to their ability to produce functional code. However, their usage presents security challenges, often resulting in…

Software Engineering · Computer Science 2025-02-07 Yujia Fu , Peng Liang , Amjed Tahir , Zengyang Li , Mojtaba Shahin , Jiaxin Yu , Jinfu Chen

Examination of Code generated by Large Language Models

Large language models (LLMs), such as ChatGPT and Copilot, are transforming software development by automating code generation and, arguably, enable rapid prototyping, support education, and boost productivity. Therefore, correctness and…

Software Engineering · Computer Science 2024-08-30 Robin Beer , Alexander Feix , Tim Guttzeit , Tamara Muras , Vincent Müller , Maurice Rauscher , Florian Schäffler , Welf Löwe

Investigating the Performance of Language Models for Completing Code in Functional Programming Languages: a Haskell Case Study

Language model-based code completion models have quickly grown in use, helping thousands of developers write code in many different programming languages. However, research on code completion models typically focuses on imperative languages…

Computation and Language · Computer Science 2024-03-25 Tim van Dam , Frank van der Heijden , Philippe de Bekker , Berend Nieuwschepen , Marc Otten , Maliheh Izadi

Stochastic Code Generation

Large language models pre-trained for code generation can generate high-quality short code but often struggle with generating coherent long code and understanding higher-level or system-level specifications. This issue is also observed in…

Computation and Language · Computer Science 2023-04-18 Swapnil Sharma , Nikita Anand , Kranthi Kiran G.