Related papers: SemCoder: Training Code Language Models with Compr…

Exploring Code Analysis: Zero-Shot Insights on Syntax and Semantics with LLMs

Code analysis is fundamental in Software Engineering, supporting debugging, optimization, and security assessment. Human developers approach it through syntax parsing, static semantics inference, and dynamic reasoning. Traditional tools are…

Software Engineering · Computer Science 2026-05-22 Wei Ma , Zhihao Lin , Shangqing Liu , Qiang Hu , Ye Liu , Wenhan Wang , Cen Zhang , Liming Nie , Li Li , Yang Liu , Lingxiao Jiang

CodeMind: Evaluating Large Language Models for Code Reasoning

Large Language Models (LLMs) have been widely used to automate programming tasks. Their capabilities have been evaluated by assessing the quality of generated code through tests or proofs. The extent to which they can reason about code is a…

Software Engineering · Computer Science 2026-04-08 Changshu Liu , Yang Chen , Reyhaneh Jabbarvand

Do Code Semantics Help? A Comprehensive Study on Execution Trace-Based Information for Code Large Language Models

Code Large Language Models (Code LLMs) have opened a new era in programming with their impressive capabilities. However, recent research has revealed critical limitations in their ability to reason about runtime behavior and understand the…

Software Engineering · Computer Science 2025-09-25 Jian Wang , Xiaofei Xie , Qiang Hu , Shangqing Liu , Yi Li

MonoCoder: Domain-Specific Code Language Model for HPC Codes and Tasks

With easier access to powerful compute resources, there is a growing trend in AI for software development to develop large language models (LLMs) to address a variety of programming tasks. Even LLMs applied to tasks from the…

Programming Languages · Computer Science 2024-09-23 Tal Kadosh , Niranjan Hasabnis , Vy A. Vo , Nadav Schneider , Neva Krien , Mihai Capota , Abdul Wasay , Nesreen Ahmed , Ted Willke , Guy Tamir , Yuval Pinter , Timothy Mattson , Gal Oren

TreeCoder: Systematic Exploration and Optimisation of Decoding and Constraints for LLM Code Generation

Large language models (LLMs) have shown remarkable ability to generate code, yet their outputs often violate syntactic or semantic constraints when guided only through natural language prompts. We introduce TreeCoder, the most general and…

Machine Learning · Computer Science 2026-04-27 Henrijs Princis , Arindam Sharma , Cristina David

Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models

Algorithmic reasoning refers to the ability to understand the complex patterns behind the problem and decompose them into a sequence of reasoning steps towards the solution. Such nature of algorithmic reasoning makes it a challenge for…

Computation and Language · Computer Science 2024-04-04 Hyungjoo Chae , Yeonghyeon Kim , Seungone Kim , Kai Tzu-iunn Ong , Beong-woo Kwak , Moohyeon Kim , Seonghwan Kim , Taeyoon Kwon , Jiwan Chung , Youngjae Yu , Jinyoung Yeo

Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning

Despite the remarkable success of large language models (LLMs) on traditional natural language processing tasks, their planning ability remains a critical bottleneck in tackling complex multi-step reasoning tasks. Existing approaches mainly…

Computation and Language · Computer Science 2024-10-07 Jiaxin Wen , Jian Guan , Hongning Wang , Wei Wu , Minlie Huang

ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation

Code translation is a crucial activity in the software development and maintenance process, and researchers have recently begun to focus on using pre-trained large language models (LLMs) for code translation. However, existing LLMs only…

Software Engineering · Computer Science 2025-09-30 Minghua He , Yue Chen , Fangkai Yang , Pu Zhao , Wenjie Yin , Yu Kang , Qingwei Lin , Saravan Rajmohan , Dongmei Zhang

Bridging Code Semantic and LLMs: Semantic Chain-of-Thought Prompting for Code Generation

Large language models (LLMs) have showcased remarkable prowess in code generation. However, automated code generation is still challenging since it requires a high-level semantic mapping between natural language requirements and codes. Most…

Computation and Language · Computer Science 2023-10-24 Yingwei Ma , Yue Yu , Shanshan Li , Yu Jiang , Yong Guo , Yuanliang Zhang , Yutao Xie , Xiangke Liao

ToolCoder: A Systematic Code-Empowered Tool Learning Framework for Large Language Models

Tool learning has emerged as a crucial capability for large language models (LLMs) to solve complex real-world tasks through interaction with external tools. Existing approaches face significant challenges, including reliance on…

Computation and Language · Computer Science 2025-06-02 Hanxing Ding , Shuchang Tao , Liang Pang , Zihao Wei , Jinyang Gao , Bolin Ding , Huawei Shen , Xueqi Cheng

Steering Large Language Models between Code Execution and Textual Reasoning

While a lot of recent research focuses on enhancing the textual reasoning capabilities of Large Language Models (LLMs) by optimizing the multi-agent framework or reasoning chains, several benchmark tasks can be solved with 100\% success…

Computation and Language · Computer Science 2025-03-04 Yongchao Chen , Harsh Jhamtani , Srinagesh Sharma , Chuchu Fan , Chi Wang

On Code-Induced Reasoning in LLMs

Code data has been shown to enhance the reasoning capabilities of large language models (LLMs), but it remains unclear which aspects of code are most responsible. We investigate this question with a systematic, data-centric framework. We…

Computation and Language · Computer Science 2025-10-03 Abdul Waheed , Zhen Wu , Carolyn Rosé , Daphne Ippolito

NExT: Teaching Large Language Models to Reason about Code Execution

A fundamental skill among human developers is the ability to understand and reason about program execution. As an example, a programmer can mentally simulate code execution in natural language to debug and repair code (aka. rubber duck…

Machine Learning · Computer Science 2024-04-24 Ansong Ni , Miltiadis Allamanis , Arman Cohan , Yinlin Deng , Kensen Shi , Charles Sutton , Pengcheng Yin

From Context to Intent: Reasoning-Guided Function-Level Code Completion

The growing capabilities of Large Language Models (LLMs) have led to their widespread adoption for function completion within code repositories. Recent studies on such tasks show promising results when explicit instructions, often in the…

Software Engineering · Computer Science 2026-03-25 Yanzhou Li , Tianlin Li , Yiran Zhang , Shangqing Liu , Aishan Liu , Xianglong Liu , Yang Liu

SynthCoder: A Synthetical Strategy to Tune LLMs for Code Completion

Code completion is a prominent application of Large Language Models (LLMs) in software engineering. Due to the near real-time response requirements of this task, base models with small to medium-sized parameters are typically employed,…

Software Engineering · Computer Science 2025-09-18 Dongjun Yu , Xiao Yan , Zhenrui Li , Jipeng Xiao , Haochuan He , Yongda Yu , Hao Zhang , Guoping Rong , Xiaobo Huang

RoCoIns: Enhancing Robustness of Large Language Models through Code-Style Instructions

Large Language Models (LLMs) have showcased remarkable capabilities in following human instructions. However, recent studies have raised concerns about the robustness of LLMs when prompted with instructions combining textual adversarial…

Computation and Language · Computer Science 2024-02-27 Yuansen Zhang , Xiao Wang , Zhiheng Xi , Han Xia , Tao Gui , Qi Zhang , Xuanjing Huang

Seed-Coder: Let the Code Model Curate Data for Itself

Code data in large language model (LLM) pretraining is recognized crucial not only for code-related tasks but also for enhancing general intelligence of LLMs. Current open-source LLMs often heavily rely on human effort to produce their code…

Computation and Language · Computer Science 2025-06-06 ByteDance Seed , Yuyu Zhang , Jing Su , Yifan Sun , Chenguang Xi , Xia Xiao , Shen Zheng , Anxiang Zhang , Kaibo Liu , Daoguang Zan , Tao Sun , Jinhua Zhu , Shulin Xin , Dong Huang , Yetao Bai , Lixin Dong , Chao Li , Jianchong Chen , Hanzhi Zhou , Yifan Huang , Guanghan Ning , Xierui Song , Jiaze Chen , Siyao Liu , Kai Shen , Liang Xiang , Yonghui Wu

DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning

Code Large Language Models (Code LLMs) have demonstrated outstanding performance in code-related tasks. Several instruction tuning approaches have been proposed to boost the code generation performance of pre-trained Code LLMs. In this…

Computation and Language · Computer Science 2024-02-15 Yejie Wang , Keqing He , Guanting Dong , Pei Wang , Weihao Zeng , Muxi Diao , Yutao Mou , Mengdi Zhang , Jingang Wang , Xunliang Cai , Weiran Xu

Large Language Models for Code Generation: A Comprehensive Survey of Challenges, Techniques, Evaluation, and Applications

Large Language Models (LLMs) have demonstrated their remarkable capabilities in numerous fields. This survey focuses on how LLMs empower users, regardless of their technical background, to use human languages to automatically generate…

Software Engineering · Computer Science 2025-04-03 Nam Huynh , Beiyu Lin

ConceptCoder: Improve Code Reasoning via Concept Learning

Large language models (LLMs) have shown promising results for software engineering applications, but still struggle with code reasoning tasks such as vulnerability detection (VD). We introduce ConceptCoder, a fine-tuning method that…

Software Engineering · Computer Science 2026-03-25 Md Mahbubur Rahman , Hengbo Tong , Wei Le