Related papers: CodeImprove: Program Adaptation for Deep Code Mode…
Deep learning has been widely adopted to tackle various code-based tasks by building deep code models based on a large amount of code snippets. While these deep code models have achieved great success, even state-of-the-art models suffer…
Testing deep learning (DL) systems requires extensive and diverse, yet valid, test inputs. While synthetic test input generation methods, such as metamorphic testing, are widely used for DL testing, they risk introducing invalid inputs that…
Advancements in deep learning have significantly improved model performance across tasks involving code, text, and image processing. However, these models still exhibit notable mispredictions in real-world applications, even when trained on…
Code language models (CLMs) play a central role in software engineering across both generation and classification tasks. However, these models still exhibit notable mispredictions in real-world applications, even when trained on up-to-date…
With the decline of Moore's law, optimizing program performance has become a major focus of software research. However, high-level optimizations such as API and algorithm changes remain elusive due to the difficulty of understanding the…
Machine learning models trained on code and related artifacts offer valuable support for software maintenance but suffer from interpretability issues due to their complex internal variables. These concerns are particularly significant in…
As one of the key tools in many security tasks, decompilers reconstruct human-readable source code from binaries. Yet, despite recent advances, their outputs often suffer from syntactic and semantic errors and remain difficult to read.…
Code review is a common process that is used by developers, in which a reviewer provides useful comments or points out defects in the submitted source code changes via pull request. Code review has been widely used for both industry and…
The advent of large language models (LLMs) has significantly advanced artificial intelligence (AI) in software engineering (SE), with source code embeddings playing a crucial role in tasks such as source code clone detection and source code…
Understanding code represents a core ability needed for automating software development tasks. While foundation models like LLMs show impressive results across many software engineering challenges, the extent of their true semantic…
When evaluating the performance of a pre-trained model transferred to a downstream task, it is imperative to assess not only the in-distribution (ID) accuracy of the downstream model but also its capacity to generalize and identify…
Testing of deep learning models is challenging due to the excessive number and complexity of computations involved. As a result, test data selection is performed manually and in an ad hoc way. This raises the question of how we can…
AI modeling for source code understanding tasks has been making significant progress, and is being adopted in production development pipelines. However, reliability concerns, especially whether the models are actually learning task-related…
The advent of instruction-tuned Large Language Models designed for coding tasks (Code LLMs) has transformed software engineering practices. However, their robustness against various input challenges remains a critical concern. This study…
In software development, it is common for programmers to copy-paste or port code snippets and then adapt them to their use case. This scenario motivates the code adaptation task -- a variant of program repair which aims to adapt variable…
Reimplementing solutions to previously solved software engineering problems is not only inefficient but also introduces inadequate and error-prone code. Many existing methods achieve impressive performance on this issue by using…
Code reviews are popular in both industrial and open source projects. The benefits of code reviews are widely recognized and include better code quality and lower likelihood of introducing bugs. However, since code review is a manual…
Intelligent code analysis has received increasing attention in parallel with the remarkable advances in the field of machine learning (ML) in recent years. A major challenge in leveraging ML for this purpose is to represent source code in a…
As software projects progress, quality of code assumes paramount importance as it affects reliability, maintainability and security of software. For this reason, static analysis tools are used in developer workflows to flag code quality…
Large language models and deep learning models designed for code intelligence have revolutionized the software engineering field due to their ability to perform various code-related tasks. These models can process source code and software…