English

WizardCoder: Empowering Code Large Language Models with Evol-Instruct

Computation and Language 2025-05-28 v2 Artificial Intelligence

Abstract

Code Large Language Models (Code LLMs), such as StarCoder, have demonstrated exceptional performance in code-related tasks. However, most existing models are solely pre-trained on extensive raw code data without instruction fine-tuning. In this paper, we introduce WizardCoder, which empowers Code LLMs with complex instruction fine-tuning, by adapting the Evol-Instruct method to the domain of code. Through comprehensive experiments on four prominent code generation benchmarks, namely HumanEval, HumanEval+, MBPP, and DS-1000, we unveil the exceptional capabilities of our model. It surpasses all other open-source Code LLMs by a substantial margin. Moreover, our model even outperforms the largest closed LLMs, Anthropic's Claude and Google's Bard, on HumanEval and HumanEval+. Our code, model weights, and data are public at https://github.com/nlpxucan/WizardLM

Keywords

Cite

@article{arxiv.2306.08568,
  title  = {WizardCoder: Empowering Code Large Language Models with Evol-Instruct},
  author = {Ziyang Luo and Can Xu and Pu Zhao and Qingfeng Sun and Xiubo Geng and Wenxiang Hu and Chongyang Tao and Jing Ma and Qingwei Lin and Daxin Jiang},
  journal= {arXiv preprint arXiv:2306.08568},
  year   = {2025}
}

Comments

Large Language model, Code Generation, Code LLMs.This paper has been accepted to ICLR 2024. Please cite the ICLR version