Related papers: Code-Survey: An LLM-Driven Methodology for Analyzi…
Large Language Models (LLMs) have garnered remarkable advancements across diverse code-related tasks, known as Code LLMs, particularly in code generation that generates source code with LLM from natural language descriptions. This…
Code security and usability are both essential for various coding assistant applications driven by large language models (LLMs). Current code security benchmarks focus solely on single evaluation task and paradigm, such as code completion…
Large language models (LLMs) have demonstrated remarkable capabilities across various NLP tasks and have recently expanded their impact to coding tasks, bridging the gap between natural languages (NL) and programming languages (PL). This…
Data profiling is critical in machine learning for generating descriptive statistics, supporting both deeper understanding and downstream tasks like data valuation and curation. This work addresses profiling specifically in the context of…
The increasing complexity of software systems has driven significant advancements in program analysis, as traditional methods unable to meet the demands of modern software development. To address these limitations, deep learning techniques,…
Code review is a critical practice in modern software engineering, helping developers detect defects early, improve code quality, and facilitate knowledge sharing. With the rapid advancement of large language models (LLMs), a growing body…
Large language models (LLMs) and transformer-based architectures are increasingly utilized for source code analysis. As software systems grow in complexity, integrating LLMs into code analysis workflows becomes essential for enhancing…
General large language models (LLMs), represented by ChatGPT, have demonstrated significant potential in tasks such as code generation in software engineering. This has led to the development of specialized LLMs for software engineering,…
Large Language Model (LLM)-based coding agents have shown promising results on coding benchmarks, but their effectiveness on systems code remains underexplored. Due to the size and complexities of systems code, making changes to a systems…
Understanding an unfamiliar codebase is an essential task for developers in various scenarios, such as during the onboarding process. Especially when the codebase is large and time is limited, achieving a decent level of comprehension…
Understanding large-scale, complex software systems is a major challenge for developers, who spend a significant portion of their time on program comprehension. Traditional tools such as static visualizations and reverse engineering…
Code analysis is fundamental in Software Engineering, supporting debugging, optimization, and security assessment. Human developers approach it through syntax parsing, static semantics inference, and dynamic reasoning. Traditional tools are…
Qualitative data analysis provides insight into the underlying perceptions and experiences within unstructured data. However, the time-consuming nature of the coding process, especially for larger datasets, calls for innovative approaches,…
In the domain of large-scale software development, the demands for dynamic and multifaceted static code analysis exceed the capabilities of traditional tools. To bridge this gap, we present CodeFuse-Query, a system that redefines static…
The recent progress of artificial intelligence(AI) has shown great potentials for alleviating human burden in various complex tasks. From the view of software engineering, AI techniques can be seen in many fundamental aspects of…
Codebooks -- documents that operationalize concepts and outline annotation procedures -- are used almost universally by social scientists when coding political texts. To code these texts automatically, researchers are increasing turning to…
Code large language models mark a pivotal breakthrough in artificial intelligence. They are specifically crafted to understand and generate programming languages, significantly boosting the efficiency of coding development workflows. In…
Task automation has been greatly empowered by the recent advances in Large Language Models (LLMs) via Python code, where the tasks ranging from software engineering development to general-purpose reasoning. While current benchmarks have…
Many software development tasks, such as implementing features and fixing bugs, begin with developers posing questions about a codebase. However, answering questions about codebases that span millions of lines of code across thousands of…
Large language models (LLMs) are rapidly reshaping software development, but their impact across the software development lifecycle is underexplored. Existing work focuses on isolated activities such as code generation or testing, leaving…