Related papers: CodeFuse-Query: A Data-Centric Static Code Analysi…
Static analysis is a growing application of software engineering, leading to a range of essential security tools, bug-finding tools, as well as software verification. Recent years show an increase of universal static analysis tools that…
As code search permeates most activities in software development,code-to-code search has emerged to support using code as a query and retrieving similar code in the search results. Applications include duplicate code detection for…
Code search is a core software engineering task. Effective code search tools can help developers substantially improve their software development efficiency and effectiveness. In recent years, many code search studies have leveraged…
Code large language models mark a pivotal breakthrough in artificial intelligence. They are specifically crafted to understand and generate programming languages, significantly boosting the efficiency of coding development workflows. In…
To detect and fix bugs and security vulnerabilities, software companies use static analysis as part of the development process. However, static analysis code itself is also prone to bugs. To ensure a consistent level of precision, as…
Modern software systems like the Linux kernel are among the world's largest and most intricate codebases, continually evolving with new features and increasing complexity. Understanding these systems poses significant challenges due to…
Code Large Language Models (Code LLMs) have gained significant attention in the industry due to their wide applications in the full lifecycle of software engineering. However, the effectiveness of existing models in understanding…
Predicting program behavior without execution is a critical task in software engineering. Existing models often fall short in capturing the dynamic dependencies among program elements. To address this, we present CodeFlow, a novel machine…
In recent years, a lot of technological advances in computer science have aided software programmers to create innovative and real-time user-friendly software. With the creation of the software and the urging interest of people to learn to…
The source code of successful projects is evolving all the time, resulting in hundreds of thousands of code changes stored in source code repositories. This wealth of data can be useful, e.g., to find changes similar to a planned code…
Repository-scale code reasoning is a cornerstone of modern AI-assisted software engineering, enabling Large Language Models (LLMs) to handle complex workflows from program comprehension to complex debugging. However, balancing accuracy with…
We introduce KodCode, a synthetic dataset that addresses the persistent challenge of acquiring high-quality, verifiable training data across diverse difficulties and domains for training Large Language Models for coding. Existing…
Software redesign preserves functionality while improving quality attributes, but manual reuse of code and tests is costly and error-prone, especially in crossrepository redesigns. Focusing on static analyzers where cross-repo redesign…
CodeChecker is an open source project that integrates different static analysis tools such as the Clang Static Analyzer and Clang-Tidy into the build systems, continuous integration loops, and development workflows of C++ programmers. It…
Semantic code search is the task of retrieving relevant code given a natural language query. While related to other information retrieval tasks, it requires bridging the gap between the language used in code (often abbreviated and highly…
In this paper, we present a novel marriage of static and dynamic analysis. Given a large code base with many functions and a mature test suite, we propose using static analysis to find functions 1) with assertions or other evident…
Code analysis is fundamental in Software Engineering, supporting debugging, optimization, and security assessment. Human developers approach it through syntax parsing, static semantics inference, and dynamic reasoning. Traditional tools are…
Modern code review is a widely used technique employed in both industrial and open-source projects to improve software quality, share knowledge, and ensure adherence to coding standards and guidelines. During code review, developers may…
Static source code analysis is a powerful tool for finding and fixing bugs when deployed properly; it is, however, all too easy to deploy it in a way that looks good superficially, but which misses important defects, shows many false…
Code review is a key development practice that contributes to improve software quality and to foster knowledge sharing among developers. However, code review usually takes time and demands detailed and time-consuming analysis of textual…