Related papers: Code-Survey: An LLM-Driven Methodology for Analyzi…

A Survey on Large Language Models for Code Generation

Large Language Models (LLMs) have garnered remarkable advancements across diverse code-related tasks, known as Code LLMs, particularly in code generation that generates source code with LLM from natural language descriptions. This…

Computation and Language · Computer Science 2025-10-28 Juyong Jiang , Fan Wang , Jiasi Shen , Sungju Kim , Sunghun Kim

Can You Really Trust Code Copilots? Evaluating Large Language Models from a Code Security Perspective

Code security and usability are both essential for various coding assistant applications driven by large language models (LLMs). Current code security benchmarks focus solely on single evaluation task and paradigm, such as code completion…

Computation and Language · Computer Science 2025-05-16 Yutao Mou , Xiao Deng , Yuxiao Luo , Shikun Zhang , Wei Ye

Code LLMs: A Taxonomy-based Survey

Large language models (LLMs) have demonstrated remarkable capabilities across various NLP tasks and have recently expanded their impact to coding tasks, bridging the gap between natural languages (NL) and programming languages (PL). This…

Computation and Language · Computer Science 2024-12-12 Nishat Raihan , Christian Newman , Marcos Zampieri

LLM-Aided Customizable Profiling of Code Data Based On Programming Language Concepts

Data profiling is critical in machine learning for generating descriptive statistics, supporting both deeper understanding and downstream tasks like data valuation and curation. This work addresses profiling specifically in the context of…

Software Engineering · Computer Science 2025-03-21 Pankaj Thorat , Adnan Qidwai , Adrija Dhar , Aishwariya Chakraborty , Anand Eswaran , Hima Patel , Praveen Jayachandran

A Contemporary Survey of Large Language Model Assisted Program Analysis

The increasing complexity of software systems has driven significant advancements in program analysis, as traditional methods unable to meet the demands of modern software development. To address these limitations, deep learning techniques,…

Software Engineering · Computer Science 2025-02-27 Jiayimei Wang , Tao Ni , Wei-Bin Lee , Qingchuan Zhao

A Survey of Code Review Benchmarks and Evaluation Practices in Pre-LLM and LLM Era

Code review is a critical practice in modern software engineering, helping developers detect defects early, improve code quality, and facilitate knowledge sharing. With the rapid advancement of large language models (LLMs), a growing body…

Software Engineering · Computer Science 2026-02-17 Taufiqul Islam Khan , Shaowei Wang , Haoxiang Zhang , Tse-Hsun Chen

Large Language Models (LLMs) for Source Code Analysis: applications, models and datasets

Large language models (LLMs) and transformer-based architectures are increasingly utilized for source code analysis. As software systems grow in complexity, integrating LLMs into code analysis workflows becomes essential for enhancing…

Software Engineering · Computer Science 2025-03-25 Hamed Jelodar , Mohammad Meymani , Roozbeh Razavi-Far

A Survey of Large Language Models for Code: Evolution, Benchmarking, and Future Trends

General large language models (LLMs), represented by ChatGPT, have demonstrated significant potential in tasks such as code generation in software engineering. This has led to the development of specialized LLMs for software engineering,…

Software Engineering · Computer Science 2024-01-09 Zibin Zheng , Kaiwen Ning , Yanlin Wang , Jingwen Zhang , Dewu Zheng , Mingxi Ye , Jiachi Chen

Code Researcher: Deep Research Agent for Large Systems Code and Commit History

Large Language Model (LLM)-based coding agents have shown promising results on coding benchmarks, but their effectiveness on systems code remains underexplored. Due to the size and complexities of systems code, making changes to a systems…

Software Engineering · Computer Science 2026-05-21 Ramneet Singh , Sathvik Joel , Abhav Mehrotra , Nalin Wadhwa , Ramakrishna B Bairi , Aditya Kanade , Nagarajan Natarajan

Understanding Codebase like a Professional! Human-AI Collaboration for Code Comprehension

Understanding an unfamiliar codebase is an essential task for developers in various scenarios, such as during the onboarding process. Especially when the codebase is large and time is limited, achieving a decent level of comprehension…

Human-Computer Interaction · Computer Science 2026-02-16 Jie Gao , Yue Xue , Xiaofei Xie , SoeMin Thant , Erika Lee , Bowen Xu

AI-Guided Exploration of Large-Scale Codebases

Understanding large-scale, complex software systems is a major challenge for developers, who spend a significant portion of their time on program comprehension. Traditional tools such as static visualizations and reverse engineering…

Software Engineering · Computer Science 2025-08-11 Yoseph Berhanu Alebachew

Exploring Code Analysis: Zero-Shot Insights on Syntax and Semantics with LLMs

Code analysis is fundamental in Software Engineering, supporting debugging, optimization, and security assessment. Human developers approach it through syntax parsing, static semantics inference, and dynamic reasoning. Traditional tools are…

Software Engineering · Computer Science 2026-05-22 Wei Ma , Zhihao Lin , Shangqing Liu , Qiang Hu , Ye Liu , Wenhan Wang , Cen Zhang , Liming Nie , Li Li , Yang Liu , Lingxiao Jiang

Decoding Complexity: Exploring Human-AI Concordance in Qualitative Coding

Qualitative data analysis provides insight into the underlying perceptions and experiences within unstructured data. However, the time-consuming nature of the coding process, especially for larger datasets, calls for innovative approaches,…

Human-Computer Interaction · Computer Science 2024-03-12 Elisabeth Kirsten , Annalina Buckmann , Abraham Mhaidli , Steffen Becker

CodeFuse-Query: A Data-Centric Static Code Analysis System for Large-Scale Organizations

In the domain of large-scale software development, the demands for dynamic and multifaceted static code analysis exceed the capabilities of traditional tools. To bridge this gap, we present CodeFuse-Query, a system that redefines static…

Software Engineering · Computer Science 2024-01-04 Xiaoheng Xie , Gang Fan , Xiaojun Lin , Ang Zhou , Shijie Li , Xunjin Zheng , Yinan Liang , Yu Zhang , Na Yu , Haokun Li , Xinyu Chen , Yingzhuang Chen , Yi Zhen , Dejun Dong , Xianjin Fu , Jinzhou Su , Fuxiong Pan , Pengshuai Luo , Youzheng Feng , Ruoxiang Hu , Jing Fan , Jinguo Zhou , Xiao Xiao , Peng Di

Rethinking complexity for software code structures: A pioneering study on Linux kernel code repository

The recent progress of artificial intelligence(AI) has shown great potentials for alleviating human burden in various complex tasks. From the view of software engineering, AI techniques can be seen in many fundamental aspects of…

Software Engineering · Computer Science 2021-03-02 Wenhe Zhang , Jin He , Kevin Song

Codebook LLMs: Evaluating LLMs as Measurement Tools for Political Science Concepts

Codebooks -- documents that operationalize concepts and outline annotation procedures -- are used almost universally by social scientists when coding political texts. To code these texts automatically, researchers are increasing turning to…

Computation and Language · Computer Science 2026-04-01 Andrew Halterman , Katherine A. Keith

CodeShell Technical Report

Code large language models mark a pivotal breakthrough in artificial intelligence. They are specifically crafted to understand and generate programming languages, significantly boosting the efficiency of coding development workflows. In…

Software Engineering · Computer Science 2024-03-26 Rui Xie , Zhengran Zeng , Zhuohao Yu , Chang Gao , Shikun Zhang , Wei Ye

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Task automation has been greatly empowered by the recent advances in Large Language Models (LLMs) via Python code, where the tasks ranging from software engineering development to general-purpose reasoning. While current benchmarks have…

Software Engineering · Computer Science 2025-04-02 Terry Yue Zhuo , Minh Chien Vu , Jenny Chim , Han Hu , Wenhao Yu , Ratnadira Widyasari , Imam Nur Bani Yusuf , Haolan Zhan , Junda He , Indraneil Paul , Simon Brunner , Chen Gong , Thong Hoang , Armel Randy Zebaze , Xiaoheng Hong , Wen-Ding Li , Jean Kaddour , Ming Xu , Zhihan Zhang , Prateek Yadav , Naman Jain , Alex Gu , Zhoujun Cheng , Jiawei Liu , Qian Liu , Zijian Wang , Binyuan Hui , Niklas Muennighoff , David Lo , Daniel Fried , Xiaoning Du , Harm de Vries , Leandro Von Werra

Generating Complex Code Analyzers from Natural Language Questions

Many software development tasks, such as implementing features and fixing bugs, begin with developers posing questions about a codebase. However, answering questions about codebases that span millions of lines of code across thousands of…

Software Engineering · Computer Science 2026-05-12 Amirmohammad Nazari , Sadra Sabouri , Wang Bill Zhu , Robin Jia , Souti Chattopadhyay , Mukund Raghothaman

LLMs' Reshaping of People, Processes, Products, and Society in Software Development: A Comprehensive Exploration with Early Adopters

Large language models (LLMs) are rapidly reshaping software development, but their impact across the software development lifecycle is underexplored. Existing work focuses on isolated activities such as code generation or testing, leaving…

Software Engineering · Computer Science 2025-11-25 Benyamin Tabarsi , Heidi Reichert , Sam Gilson , Ally Limke , Sandeep Kuttal , Tiffany Barnes