Related papers: AutoCoder: Enhancing Code Large Language Model wit…

WizardCoder: Empowering Code Large Language Models with Evol-Instruct

Code Large Language Models (Code LLMs), such as StarCoder, have demonstrated exceptional performance in code-related tasks. However, most existing models are solely pre-trained on extensive raw code data without instruction fine-tuning. In…

Computation and Language · Computer Science 2025-05-28 Ziyang Luo , Can Xu , Pu Zhao , Qingfeng Sun , Xiubo Geng , Wenxiang Hu , Chongyang Tao , Jing Ma , Qingwei Lin , Daxin Jiang

AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question Answering

We propose a novel and challenging benchmark, AutoEval-Video, to comprehensively evaluate large vision-language models in open-ended video question answering. The comprehensiveness of AutoEval-Video is demonstrated in two aspects: 1)…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Xiuyuan Chen , Yuan Lin , Yuchen Zhang , Weiran Huang

ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation

Code translation is a crucial activity in the software development and maintenance process, and researchers have recently begun to focus on using pre-trained large language models (LLMs) for code translation. However, existing LLMs only…

Software Engineering · Computer Science 2025-09-30 Minghua He , Yue Chen , Fangkai Yang , Pu Zhao , Wenjie Yin , Yu Kang , Qingwei Lin , Saravan Rajmohan , Dongmei Zhang

OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

The introduction of large language models has significantly advanced code generation. However, open-source models often lack the execution capabilities and iterative refinement of advanced systems like the GPT-4 Code Interpreter. To address…

Software Engineering · Computer Science 2025-01-08 Tianyu Zheng , Ge Zhang , Tianhao Shen , Xueling Liu , Bill Yuchen Lin , Jie Fu , Wenhu Chen , Xiang Yue

MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning

The recently released GPT-4 Code Interpreter has demonstrated remarkable proficiency in solving challenging math problems, primarily attributed to its ability to seamlessly reason with natural language, generate code, execute code, and…

Computation and Language · Computer Science 2023-10-06 Ke Wang , Houxing Ren , Aojun Zhou , Zimu Lu , Sichun Luo , Weikang Shi , Renrui Zhang , Linqi Song , Mingjie Zhan , Hongsheng Li

SelfEvolve: A Code Evolution Framework via Large Language Models

Large language models (LLMs) have already revolutionized code generation, after being pretrained on publicly available code data. However, while various methods have been proposed to augment LLMs with retrieved knowledge and enhance the…

Computation and Language · Computer Science 2023-06-06 Shuyang Jiang , Yuhao Wang , Yu Wang

RoboCoder: Robotic Learning from Basic Skills to General Tasks with Large Language Models

The emergence of Large Language Models (LLMs) has improved the prospects for robotic tasks. However, existing benchmarks are still limited to single tasks with limited generalization capabilities. In this work, we introduce a comprehensive…

Robotics · Computer Science 2024-06-07 Jingyao Li , Pengguang Chen , Sitong Wu , Chuanyang Zheng , Hong Xu , Jiaya Jia

Evaluating Large Language Models Trained on Code

We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we…

Machine Learning · Computer Science 2021-07-15 Mark Chen , Jerry Tworek , Heewoo Jun , Qiming Yuan , Henrique Ponde de Oliveira Pinto , Jared Kaplan , Harri Edwards , Yuri Burda , Nicholas Joseph , Greg Brockman , Alex Ray , Raul Puri , Gretchen Krueger , Michael Petrov , Heidy Khlaaf , Girish Sastry , Pamela Mishkin , Brooke Chan , Scott Gray , Nick Ryder , Mikhail Pavlov , Alethea Power , Lukasz Kaiser , Mohammad Bavarian , Clemens Winter , Philippe Tillet , Felipe Petroski Such , Dave Cummings , Matthias Plappert , Fotios Chantzis , Elizabeth Barnes , Ariel Herbert-Voss , William Hebgen Guss , Alex Nichol , Alex Paino , Nikolas Tezak , Jie Tang , Igor Babuschkin , Suchir Balaji , Shantanu Jain , William Saunders , Christopher Hesse , Andrew N. Carr , Jan Leike , Josh Achiam , Vedant Misra , Evan Morikawa , Alec Radford , Matthew Knight , Miles Brundage , Mira Murati , Katie Mayer , Peter Welinder , Bob McGrew , Dario Amodei , Sam McCandlish , Ilya Sutskever , Wojciech Zaremba

AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation

The advancement of natural language processing (NLP) has been significantly boosted by the development of transformer-based large language models (LLMs). These models have revolutionized NLP tasks, particularly in code generation, aiding…

Computation and Language · Computer Science 2024-05-27 Dong Huang , Jie M. Zhang , Michael Luck , Qingwen Bu , Yuhao Qing , Heming Cui

ToolCoder: Teach Code Generation Models to use API search tools

Automatically generating source code from natural language descriptions has been a growing field of research in recent years. However, current large-scale code generation models often encounter difficulties when selecting appropriate APIs…

Software Engineering · Computer Science 2023-09-12 Kechi Zhang , Huangzhao Zhang , Ge Li , Jia Li , Zhuo Li , Zhi Jin

SantaCoder: don't reach for the stars!

The BigCode project is an open-scientific collaboration working on the responsible development of large language models for code. This tech report describes the progress of the collaboration until December 2022, outlining the current state…

Software Engineering · Computer Science 2023-02-27 Loubna Ben Allal , Raymond Li , Denis Kocetkov , Chenghao Mou , Christopher Akiki , Carlos Munoz Ferrandis , Niklas Muennighoff , Mayank Mishra , Alex Gu , Manan Dey , Logesh Kumar Umapathi , Carolyn Jane Anderson , Yangtian Zi , Joel Lamy Poirier , Hailey Schoelkopf , Sergey Troshin , Dmitry Abulkhanov , Manuel Romero , Michael Lappert , Francesco De Toni , Bernardo García del Río , Qian Liu , Shamik Bose , Urvashi Bhattacharyya , Terry Yue Zhuo , Ian Yu , Paulo Villegas , Marco Zocca , Sourab Mangrulkar , David Lansky , Huu Nguyen , Danish Contractor , Luis Villa , Jia Li , Dzmitry Bahdanau , Yacine Jernite , Sean Hughes , Daniel Fried , Arjun Guha , Harm de Vries , Leandro von Werra

A Systematic Evaluation of Large Language Models of Code

Large language models (LMs) of code have recently shown tremendous promise in completing code and synthesizing code from natural language descriptions. However, the current state-of-the-art code LMs (e.g., Codex (Chen et al., 2021)) are not…

Programming Languages · Computer Science 2022-05-05 Frank F. Xu , Uri Alon , Graham Neubig , Vincent J. Hellendoorn

AutoTest: Evolutionary Code Solution Selection with Test Cases

With the development of code generation techniques, selecting the correct code solution from multiple candidate solutions has become a crucial task. This study proposes AutoTest, a novel technique that combines automated test case…

Software Engineering · Computer Science 2024-08-23 Zhihua Duan , Jialin Wang

OctoPack: Instruction Tuning Code Large Language Models

Finetuning large language models (LLMs) on instructions leads to vast performance improvements on natural language tasks. We apply instruction tuning using code, leveraging the natural structure of Git commits, which pair code changes with…

Computation and Language · Computer Science 2024-02-20 Niklas Muennighoff , Qian Liu , Armel Zebaze , Qinkai Zheng , Binyuan Hui , Terry Yue Zhuo , Swayam Singh , Xiangru Tang , Leandro von Werra , Shayne Longpre

HumanEval on Latest GPT Models -- 2024

In 2023, we are using the latest models of GPT-4 to advance program synthesis. The large language models have significantly improved the state-of-the-art for this purpose. To make these advancements more accessible, we have created a…

Computation and Language · Computer Science 2024-02-26 Daniel Li , Lincoln Murr

InstructCoder: Instruction Tuning Large Language Models for Code Editing

Code editing encompasses a variety of pragmatic tasks that developers deal with daily. Despite its relevance and practical usefulness, automatic code editing remains an underexplored area in the evolution of deep learning models, partly due…

Computation and Language · Computer Science 2024-02-29 Kaixin Li , Qisheng Hu , Xu Zhao , Hui Chen , Yuxi Xie , Tiedong Liu , Qizhe Xie , Junxian He

EffiCoder: Enhancing Code Generation in Large Language Models through Efficiency-Aware Fine-tuning

As large language models (LLMs) play an increasingly important role in code generation, enhancing both correctness and efficiency has become crucial. Current methods primarily focus on correctness, often overlooking efficiency. To address…

Computation and Language · Computer Science 2025-06-17 Dong Huang , Guangtao Zeng , Jianbo Dai , Meng Luo , Han Weng , Yuhao Qing , Heming Cui , Zhijiang Guo , Jie M. Zhang

BioCoder: A Benchmark for Bioinformatics Code Generation with Large Language Models

Pre-trained large language models (LLMs) have significantly improved code generation. As these models scale up, there is an increasing need for the output to handle more intricate tasks and to be appropriately specialized to particular…

Machine Learning · Computer Science 2024-05-22 Xiangru Tang , Bill Qian , Rick Gao , Jiakang Chen , Xinyun Chen , Mark Gerstein

PerfCoder: Large Language Models for Interpretable Code Performance Optimization

Large language models (LLMs) have achieved remarkable progress in automatic code generation, yet their ability to produce high-performance code remains limited--a critical requirement in real-world software systems. We argue that current…

Software Engineering · Computer Science 2026-05-11 Jiuding Yang , Shengyao Lu , Hongxuan Liu , Shayan Shirahmad Gale Bagi , Zahra Fazel , Tomasz Czajkowski , Di Niu

QCoder Benchmark: Bridging Language Generation and Quantum Hardware through Simulator-Based Feedback

Large language models (LLMs) have increasingly been applied to automatic programming code generation. This task can be viewed as a language generation task that bridges natural language, human knowledge, and programming logic. However, it…

Computation and Language · Computer Science 2025-11-04 Taku Mikuriya , Tatsuya Ishigaki , Masayuki Kawarada , Shunya Minami , Tadashi Kadowaki , Yohichi Suzuki , Soshun Naito , Shunya Takata , Takumi Kato , Tamotsu Basseda , Reo Yamada , Hiroya Takamura