Related papers: WizardCoder: Empowering Code Large Language Models…

WizardLM: Empowering large pre-trained language models to follow complex instructions

Training large language models (LLMs) with open-domain instruction following data brings colossal success. However, manually creating such instruction data is very time-consuming and labor-intensive. Moreover, humans may struggle to produce…

Computation and Language · Computer Science 2025-05-28 Can Xu , Qingfeng Sun , Kai Zheng , Xiubo Geng , Pu Zhao , Jiazhan Feng , Chongyang Tao , Qingwei Lin , Daxin Jiang

Finetuning Large Language Models for Vulnerability Detection

This paper presents the results of finetuning large language models (LLMs) for the task of detecting vulnerabilities in source code. We leverage WizardCoder, a recent improvement of the state-of-the-art LLM StarCoder, and adapt it for…

Cryptography and Security · Computer Science 2024-07-30 Alexey Shestov , Rodion Levichev , Ravil Mussabayev , Evgeny Maslov , Anton Cheshkov , Pavel Zadorozhny

WaveCoder: Widespread And Versatile Enhancement For Code Large Language Models By Instruction Tuning

Recent work demonstrates that, after instruction tuning, Code Large Language Models (Code LLMs) can obtain impressive capabilities to address a wide range of code-related tasks. However, current instruction tuning methods for Code LLMs…

Computation and Language · Computer Science 2024-06-10 Zhaojian Yu , Xin Zhang , Ning Shang , Yangyu Huang , Can Xu , Yishujie Zhao , Wenxiang Hu , Qiufeng Yin

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

Large language models (LLMs), such as GPT-4, have shown remarkable performance in natural language processing (NLP) tasks, including challenging mathematical reasoning. However, most existing open-source models are only pre-trained on…

Computation and Language · Computer Science 2025-06-05 Haipeng Luo , Qingfeng Sun , Can Xu , Pu Zhao , Jianguang Lou , Chongyang Tao , Xiubo Geng , Qingwei Lin , Shifeng Chen , Yansong Tang , Dongmei Zhang

Magicoder: Empowering Code Generation with OSS-Instruct

We introduce Magicoder, a series of fully open-source (code, weights, and data) Large Language Models (LLMs) for code that significantly closes the gap with top code models while having no more than 7B parameters. Magicoder models are…

Computation and Language · Computer Science 2024-06-10 Yuxiang Wei , Zhe Wang , Jiawei Liu , Yifeng Ding , Lingming Zhang

ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation

Code translation is a crucial activity in the software development and maintenance process, and researchers have recently begun to focus on using pre-trained large language models (LLMs) for code translation. However, existing LLMs only…

Software Engineering · Computer Science 2025-09-30 Minghua He , Yue Chen , Fangkai Yang , Pu Zhao , Wenjie Yin , Yu Kang , Qingwei Lin , Saravan Rajmohan , Dongmei Zhang

CodingTeachLLM: Empowering LLM's Coding Ability via AST Prior Knowledge

In this paper, we introduce CodingTeachLLM, a large language model (LLM) designed for coding teaching. Specially, we aim to enhance the coding ability of LLM and lead it to better teaching mode in education context. Thus, we propose an…

Machine Learning · Computer Science 2025-04-02 Zhangquan Chen , Chunjiang Liu , Haobin Duan

InstructCoder: Instruction Tuning Large Language Models for Code Editing

Code editing encompasses a variety of pragmatic tasks that developers deal with daily. Despite its relevance and practical usefulness, automatic code editing remains an underexplored area in the evolution of deep learning models, partly due…

Computation and Language · Computer Science 2024-02-29 Kaixin Li , Qisheng Hu , Xu Zhao , Hui Chen , Yuxi Xie , Tiedong Liu , Qizhe Xie , Junxian He

ConceptCoder: Improve Code Reasoning via Concept Learning

Large language models (LLMs) have shown promising results for software engineering applications, but still struggle with code reasoning tasks such as vulnerability detection (VD). We introduce ConceptCoder, a fine-tuning method that…

Software Engineering · Computer Science 2026-03-25 Md Mahbubur Rahman , Hengbo Tong , Wei Le

AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data

Open-source Large Language Models (LLMs) and their specialized variants, particularly Code LLMs, have recently delivered impressive performance. However, previous Code LLMs are typically fine-tuned on single-source data with limited quality…

Computation and Language · Computer Science 2025-02-04 Zifan Song , Yudong Wang , Wenwei Zhang , Kuikun Liu , Chengqi Lyu , Demin Song , Qipeng Guo , Hang Yan , Dahua Lin , Kai Chen , Cairong Zhao

DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning

Code Large Language Models (Code LLMs) have demonstrated outstanding performance in code-related tasks. Several instruction tuning approaches have been proposed to boost the code generation performance of pre-trained Code LLMs. In this…

Computation and Language · Computer Science 2024-02-15 Yejie Wang , Keqing He , Guanting Dong , Pei Wang , Weihao Zeng , Muxi Diao , Yutao Mou , Mengdi Zhang , Jingang Wang , Xunliang Cai , Weiran Xu

AutoCoder: Enhancing Code Large Language Model with \textsc{AIEV-Instruct}

We introduce AutoCoder, the first Large Language Model to surpass GPT-4 Turbo (April 2024) and GPT-4o in pass@1 on the Human Eval benchmark test ($\mathbf{90.9\%}$ vs. $\mathbf{90.2\%}$). In addition, AutoCoder offers a more versatile code…

Software Engineering · Computer Science 2024-05-27 Bin Lei , Yuchen Li , Qiuwu Chen

OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs

Large Language Models (LLMs) have transformed software development by enabling code generation, automated debugging, and complex reasoning. However, their continued advancement is constrained by the scarcity of high-quality, publicly…

Software Engineering · Computer Science 2025-08-11 Wasi Uddin Ahmad , Aleksander Ficek , Mehrzad Samadi , Jocelyn Huang , Vahid Noroozi , Somshubra Majumdar , Boris Ginsburg

InverseCoder: Self-improving Instruction-Tuned Code LLMs with Inverse-Instruct

Recent advancements in open-source code large language models (LLMs) have been driven by fine-tuning on the data generated from powerful closed-source LLMs, which are expensive to obtain. This paper explores whether it is possible to use a…

Computation and Language · Computer Science 2024-12-17 Yutong Wu , Di Huang , Wenxuan Shi , Wei Wang , Lingzhe Gao , Shihao Liu , Ziyuan Nan , Kaizhao Yuan , Rui Zhang , Xishan Zhang , Zidong Du , Qi Guo , Yewen Pu , Dawei Yin , Xing Hu , Yunji Chen

WarriorCoder: Learning from Expert Battles to Augment Code Large Language Models

Despite recent progress achieved by code large language models (LLMs), their remarkable abilities are largely dependent on fine-tuning on the high-quality data, posing challenges for data collection and annotation. To address this, current…

Computation and Language · Computer Science 2025-02-19 Huawen Feng , Pu Zhao , Qingfeng Sun , Can Xu , Fangkai Yang , Lu Wang , Qianli Ma , Qingwei Lin , Saravan Rajmohan , Dongmei Zhang , Qi Zhang

EffiCoder: Enhancing Code Generation in Large Language Models through Efficiency-Aware Fine-tuning

As large language models (LLMs) play an increasingly important role in code generation, enhancing both correctness and efficiency has become crucial. Current methods primarily focus on correctness, often overlooking efficiency. To address…

Computation and Language · Computer Science 2025-06-17 Dong Huang , Guangtao Zeng , Jianbo Dai , Meng Luo , Han Weng , Yuhao Qing , Heming Cui , Zhijiang Guo , Jie M. Zhang

A Systematic Evaluation of Large Language Models of Code

Large language models (LMs) of code have recently shown tremendous promise in completing code and synthesizing code from natural language descriptions. However, the current state-of-the-art code LMs (e.g., Codex (Chen et al., 2021)) are not…

Programming Languages · Computer Science 2022-05-05 Frank F. Xu , Uri Alon , Graham Neubig , Vincent J. Hellendoorn

StarCoder: may the source be with you!

The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling…

Computation and Language · Computer Science 2023-12-14 Raymond Li , Loubna Ben Allal , Yangtian Zi , Niklas Muennighoff , Denis Kocetkov , Chenghao Mou , Marc Marone , Christopher Akiki , Jia Li , Jenny Chim , Qian Liu , Evgenii Zheltonozhskii , Terry Yue Zhuo , Thomas Wang , Olivier Dehaene , Mishig Davaadorj , Joel Lamy-Poirier , João Monteiro , Oleh Shliazhko , Nicolas Gontier , Nicholas Meade , Armel Zebaze , Ming-Ho Yee , Logesh Kumar Umapathi , Jian Zhu , Benjamin Lipkin , Muhtasham Oblokulov , Zhiruo Wang , Rudra Murthy , Jason Stillerman , Siva Sankalp Patel , Dmitry Abulkhanov , Marco Zocca , Manan Dey , Zhihan Zhang , Nour Fahmy , Urvashi Bhattacharyya , Wenhao Yu , Swayam Singh , Sasha Luccioni , Paulo Villegas , Maxim Kunakov , Fedor Zhdanov , Manuel Romero , Tony Lee , Nadav Timor , Jennifer Ding , Claire Schlesinger , Hailey Schoelkopf , Jan Ebert , Tri Dao , Mayank Mishra , Alex Gu , Jennifer Robinson , Carolyn Jane Anderson , Brendan Dolan-Gavitt , Danish Contractor , Siva Reddy , Daniel Fried , Dzmitry Bahdanau , Yacine Jernite , Carlos Muñoz Ferrandis , Sean Hughes , Thomas Wolf , Arjun Guha , Leandro von Werra , Harm de Vries

OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

Large language models (LLMs) for code have become indispensable in various domains, including code generation, reasoning tasks and agent systems. While open-access code LLMs are increasingly approaching the performance levels of proprietary…

Computation and Language · Computer Science 2025-03-21 Siming Huang , Tianhao Cheng , J. K. Liu , Jiaran Hao , Liuyihan Song , Yang Xu , J. Yang , Jiaheng Liu , Chenchen Zhang , Linzheng Chai , Ruifeng Yuan , Zhaoxiang Zhang , Jie Fu , Qian Liu , Ge Zhang , Zili Wang , Yuan Qi , Yinghui Xu , Wei Chu

AceCoder: Utilizing Existing Code to Enhance Code Generation

Large Language Models (LLMs) have shown great success in code generation. LLMs take as the input a prompt and output the code. A key question is how to make prompts (i.e., Prompting Techniques). Existing prompting techniques are designed…

Software Engineering · Computer Science 2023-09-08 Jia Li , Yunfei Zhao , Yongmin Li , Ge Li , Zhi Jin