Related papers: CodeShell Technical Report

DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

The rapid development of large language models has revolutionized code intelligence in software development. However, the predominance of closed-source models has restricted extensive research and development. To address this, we introduce…

Software Engineering · Computer Science 2024-01-29 Daya Guo , Qihao Zhu , Dejian Yang , Zhenda Xie , Kai Dong , Wentao Zhang , Guanting Chen , Xiao Bi , Y. Wu , Y. K. Li , Fuli Luo , Yingfei Xiong , Wenfeng Liang

CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X

Large pre-trained code generation models, such as OpenAI Codex, can generate syntax- and function-correct code, making the coding of programmers more productive and our pursuit of artificial general intelligence closer. In this paper, we…

Machine Learning · Computer Science 2024-07-11 Qinkai Zheng , Xiao Xia , Xu Zou , Yuxiao Dong , Shan Wang , Yufei Xue , Zihan Wang , Lei Shen , Andi Wang , Yang Li , Teng Su , Zhilin Yang , Jie Tang

CodeEval: A pedagogical approach for targeted evaluation of code-trained Large Language Models

Large Language Models (LLMs) are predominantly assessed based on their common sense reasoning, language comprehension, and logical reasoning abilities. While models trained in specialized domains like mathematics or coding have demonstrated…

Software Engineering · Computer Science 2026-01-08 Danny Brahman , Mohammad Mahoor

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

Benchmark datasets have a significant impact on accelerating research in programming language tasks. In this paper, we introduce CodeXGLUE, a benchmark dataset to foster machine learning research for program understanding and generation.…

Software Engineering · Computer Science 2021-03-17 Shuai Lu , Daya Guo , Shuo Ren , Junjie Huang , Alexey Svyatkovskiy , Ambrosio Blanco , Colin Clement , Dawn Drain , Daxin Jiang , Duyu Tang , Ge Li , Lidong Zhou , Linjun Shou , Long Zhou , Michele Tufano , Ming Gong , Ming Zhou , Nan Duan , Neel Sundaresan , Shao Kun Deng , Shengyu Fu , Shujie Liu

Seed-Coder: Let the Code Model Curate Data for Itself

Code data in large language model (LLM) pretraining is recognized crucial not only for code-related tasks but also for enhancing general intelligence of LLMs. Current open-source LLMs often heavily rely on human effort to produce their code…

Computation and Language · Computer Science 2025-06-06 ByteDance Seed , Yuyu Zhang , Jing Su , Yifan Sun , Chenguang Xi , Xia Xiao , Shen Zheng , Anxiang Zhang , Kaibo Liu , Daoguang Zan , Tao Sun , Jinhua Zhu , Shulin Xin , Dong Huang , Yetao Bai , Lixin Dong , Chao Li , Jianchong Chen , Hanzhi Zhou , Yifan Huang , Guanghan Ning , Xierui Song , Jiaze Chen , Siyao Liu , Kai Shen , Liang Xiang , Yonghui Wu

Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning

Despite the remarkable success of large language models (LLMs) on traditional natural language processing tasks, their planning ability remains a critical bottleneck in tackling complex multi-step reasoning tasks. Existing approaches mainly…

Computation and Language · Computer Science 2024-10-07 Jiaxin Wen , Jian Guan , Hongning Wang , Wei Wu , Minlie Huang

Human-Aligned Code Readability Assessment with Large Language Models

Code readability is crucial for software comprehension and maintenance, yet difficult to assess at scale. Traditional static metrics often fail to capture the subjective, context-sensitive nature of human judgments. Large Language Models…

Software Engineering · Computer Science 2025-10-21 Wendkûuni C. Ouédraogo , Yinghua Li , Xueqi Dang , Pawel Borsukiewicz , Xin Zhou , Anil Koyuncu , Jacques Klein , David Lo , Tegawendé F. Bissyandé

CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance

Existing methods fail to effectively steer Large Language Models (LLMs) between textual reasoning and code generation, leaving symbolic computing capabilities underutilized. We introduce CodeSteer, an effective method for guiding LLM…

Computation and Language · Computer Science 2025-05-30 Yongchao Chen , Yilun Hao , Yueying Liu , Yang Zhang , Chuchu Fan

Exploring Code Analysis: Zero-Shot Insights on Syntax and Semantics with LLMs

Code analysis is fundamental in Software Engineering, supporting debugging, optimization, and security assessment. Human developers approach it through syntax parsing, static semantics inference, and dynamic reasoning. Traditional tools are…

Software Engineering · Computer Science 2026-05-22 Wei Ma , Zhihao Lin , Shangqing Liu , Qiang Hu , Ye Liu , Wenhan Wang , Cen Zhang , Liming Nie , Li Li , Yang Liu , Lingxiao Jiang

Stable Code Technical Report

We introduce Stable Code, the first in our new-generation of code language models series, which serves as a general-purpose base code language model targeting code completion, reasoning, math, and other software engineering-based tasks.…

Computation and Language · Computer Science 2024-04-02 Nikhil Pinnaparaju , Reshinth Adithyan , Duy Phung , Jonathan Tow , James Baicoianu , Ashish Datta , Maksym Zhuravinskyi , Dakota Mahan , Marco Bellagente , Carlos Riquelme , Nathan Cooper

StarCoder: may the source be with you!

The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling…

Computation and Language · Computer Science 2023-12-14 Raymond Li , Loubna Ben Allal , Yangtian Zi , Niklas Muennighoff , Denis Kocetkov , Chenghao Mou , Marc Marone , Christopher Akiki , Jia Li , Jenny Chim , Qian Liu , Evgenii Zheltonozhskii , Terry Yue Zhuo , Thomas Wang , Olivier Dehaene , Mishig Davaadorj , Joel Lamy-Poirier , João Monteiro , Oleh Shliazhko , Nicolas Gontier , Nicholas Meade , Armel Zebaze , Ming-Ho Yee , Logesh Kumar Umapathi , Jian Zhu , Benjamin Lipkin , Muhtasham Oblokulov , Zhiruo Wang , Rudra Murthy , Jason Stillerman , Siva Sankalp Patel , Dmitry Abulkhanov , Marco Zocca , Manan Dey , Zhihan Zhang , Nour Fahmy , Urvashi Bhattacharyya , Wenhao Yu , Swayam Singh , Sasha Luccioni , Paulo Villegas , Maxim Kunakov , Fedor Zhdanov , Manuel Romero , Tony Lee , Nadav Timor , Jennifer Ding , Claire Schlesinger , Hailey Schoelkopf , Jan Ebert , Tri Dao , Mayank Mishra , Alex Gu , Jennifer Robinson , Carolyn Jane Anderson , Brendan Dolan-Gavitt , Danish Contractor , Siva Reddy , Daniel Fried , Dzmitry Bahdanau , Yacine Jernite , Carlos Muñoz Ferrandis , Sean Hughes , Thomas Wolf , Arjun Guha , Leandro von Werra , Harm de Vries

CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models

Pre-trained on massive amounts of code and text data, large language models (LLMs) have demonstrated remarkable achievements in performing code generation tasks. With additional execution-based feedback, these models can act as agents with…

Computation and Language · Computer Science 2024-11-14 Jierui Li , Hung Le , Yingbo Zhou , Caiming Xiong , Silvio Savarese , Doyen Sahoo

Code-Driven Inductive Synthesis: Enhancing Reasoning Abilities of Large Language Models with Sequences

Large language models make remarkable progress in reasoning capabilities. Existing works focus mainly on deductive reasoning tasks (e.g., code and math), while another type of reasoning mode that better aligns with human learning, inductive…

Computation and Language · Computer Science 2025-03-18 Kedi Chen , Zhikai Lei , Fan Zhang , Yinqi Zhang , Qin Chen , Jie Zhou , Liang He , Qipeng Guo , Kai Chen , Wei Zhang

CodeMMLU: A Multi-Task Benchmark for Assessing Code Understanding & Reasoning Capabilities of CodeLLMs

Recent advances in Code Large Language Models (CodeLLMs) have primarily focused on open-ended code generation, often overlooking the crucial aspect of code understanding and reasoning. To bridge this gap, we introduce CodeMMLU, a…

Software Engineering · Computer Science 2025-04-10 Dung Nguyen Manh , Thang Phan Chau , Nam Le Hai , Thong T. Doan , Nam V. Nguyen , Quang Pham , Nghi D. Q. Bui

IFEvalCode: Controlled Code Generation

Code large language models (Code LLMs) have made significant progress in code generation by translating natural language descriptions into functional code; however, real-world applications often demand stricter adherence to detailed…

Computation and Language · Computer Science 2025-08-04 Jian Yang , Wei Zhang , Shukai Liu , Linzheng Chai , Yingshui Tan , Jiaheng Liu , Ge Zhang , Wangchunshu Zhou , Guanglin Niu , Zhoujun Li , Binyuan Hui , Junyang Lin

Benchmarking Language Models for Code Syntax Understanding

Pre-trained language models have demonstrated impressive performance in both natural language processing and program understanding, which represent the input as a token sequence without explicitly modeling its structure. Some prior works…

Computation and Language · Computer Science 2022-10-27 Da Shen , Xinyun Chen , Chenguang Wang , Koushik Sen , Dawn Song

Unraveling the Potential of Large Language Models in Code Translation: How Far Are We?

While large language models (LLMs) exhibit state-of-the-art performance in various tasks, recent studies have revealed their struggle for code translation. This is because they haven't been extensively pre-trained with parallel multilingual…

Software Engineering · Computer Science 2024-10-15 Qingxiao Tao , Tingrui Yu , Xiaodong Gu , Beijun Shen

CodeComplex: Dataset for Worst-Case Time Complexity Prediction

Reasoning ability of Large Language Models (LLMs) is a crucial ability, especially in complex decision-making tasks. One significant task to show LLMs' reasoning capability is code time complexity prediction, which involves various…

Software Engineering · Computer Science 2024-12-25 Seung-Yeop Baik , Joonghyuk Hahn , Jungin Kim , Mingi Jeon , Aditi , Yo-Sub Han , Sang-Ki Ko

A Survey on Large Language Models for Code Generation

Large Language Models (LLMs) have garnered remarkable advancements across diverse code-related tasks, known as Code LLMs, particularly in code generation that generates source code with LLM from natural language descriptions. This…

Computation and Language · Computer Science 2025-10-28 Juyong Jiang , Fan Wang , Jiasi Shen , Sungju Kim , Sunghun Kim

From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence

Large language models (LLMs) have fundamentally transformed automated software development by enabling direct translation of natural language descriptions into functional code, driving commercial adoption through tools like Github Copilot…

Software Engineering · Computer Science 2025-12-09 Jian Yang , Xianglong Liu , Weifeng Lv , Ken Deng , Shawn Guo , Lin Jing , Yizhi Li , Shark Liu , Xianzhen Luo , Yuyu Luo , Changzai Pan , Ensheng Shi , Yingshui Tan , Renshuai Tao , Jiajun Wu , Xianjie Wu , Zhenhe Wu , Daoguang Zan , Chenchen Zhang , Wei Zhang , He Zhu , Terry Yue Zhuo , Kerui Cao , Xianfu Cheng , Jun Dong , Shengjie Fang , Zhiwei Fei , Xiangyuan Guan , Qipeng Guo , Zhiguang Han , Joseph James , Tianqi Luo , Renyuan Li , Yuhang Li , Yiming Liang , Congnan Liu , Jiaheng Liu , Qian Liu , Ruitong Liu , Tyler Loakman , Xiangxin Meng , Chuang Peng , Tianhao Peng , Jiajun Shi , Mingjie Tang , Boyang Wang , Haowen Wang , Yunli Wang , Fanglin Xu , Zihan Xu , Fei Yuan , Ge Zhang , Jiayi Zhang , Xinhao Zhang , Wangchunshu Zhou , Hualei Zhu , King Zhu , Bryan Dai , Aishan Liu , Zhoujun Li , Chenghua Lin , Tianyu Liu , Chao Peng , Kai Shen , Libo Qin , Shuangyong Song , Zizheng Zhan , Jiajun Zhang , Jie Zhang , Zhaoxiang Zhang , Bo Zheng