Related papers: The GitHub Recent Bugs Dataset for Evaluating LLM-…
Bug reproduction is a critical developer activity that is also challenging to automate, as bug reports are often in natural language and thus can be difficult to transform to test cases consistently. As a result, existing techniques mostly…
Large Language Models (LLMs) have demonstrated exceptional coding capability. However, as another critical component of programming proficiency, the debugging capability of LLMs remains relatively unexplored. Previous evaluations of LLMs'…
Large Language Models (LLMs) are increasingly applied to automated software testing, yet their ability to generalize beyond memorized patterns and reason about natural language bug reports remains unclear. We present a systematic evaluation…
Large Language Models (LLMs) for code have gained significant attention recently. They can generate code in different programming languages based on provided prompts, fulfilling a long-lasting dream in Software Engineering (SE), i.e.,…
Large language models have shown good potential in supporting software development tasks. This is why more and more developers turn to LLMs (e.g. ChatGPT) to support them in fixing their buggy code. While this can save time and effort, many…
Large Language Models (LLMs) have become integral to various software engineering tasks, including code generation, bug detection, and repair. To evaluate model performance in these domains, numerous bug benchmarks containing real-world…
Many automated test generation techniques have been developed to aid developers with writing tests. To facilitate full automation, most existing techniques aim to either increase coverage, or generate exploratory inputs. However, existing…
The automated program repair field has attracted substantial interest over the years, but despite significant research efforts, creating a system that works well for complex semantic bugs such as security vulnerabilities has proven…
Bugs are notoriously challenging: they slow down software users and result in time-consuming investigations for developers. These challenges are exacerbated when bugs must be reported in natural language by users. Indeed, we lack reliable…
Context: Large Language Models (LLMs) such as ChatGPT are increasingly adopted in software engineering (SE) education, offering both opportunities and challenges. Their adoption requires systematic investigation to ensure responsible…
Background: Bug reports are essential to the software development life cycle. They help developers track and resolve issues, but are often difficult to process due to their complexity, which can delay resolution and affect software quality.…
Large language models (LLMs), such as OpenAI's Codex, have demonstrated their potential to generate code from natural language descriptions across a wide range of programming tasks. Several benchmarks have recently emerged to evaluate the…
Large Language Models (LLMs) have demonstrated remarkable performance in code completion. However, the training data used to develop these models often contain a significant amount of buggy code. Yet, it remains unclear to what extent these…
With the rapid development of large language models (LLMs), distributed training and inference frameworks like DeepSpeed have become essential for scaling model training and inference across multiple GPUs or nodes. However, the increasing…
Large Language Models (LLMs) have been gaining increasing attention and demonstrated promising performance across a variety of Software Engineering (SE) tasks, such as Automated Program Repair (APR), code summarization, and code completion.…
Large Language Models (LLMs) are currently used extensively to generate code by professionals and students, motivating the development of tools to detect LLM-generated code for applications such as academic integrity and cybersecurity. We…
Failure-inducing inputs play a crucial role in diagnosing and analyzing software bugs. Bug reports typically contain these inputs, which developers extract to facilitate debugging. Since bug reports are written in natural language, prior…
Bug-fix benchmarks are essential for evaluating methodologies in automatic program repair (APR) and fault localization (FL). However, existing benchmarks, exemplified by Defects4J, need to evolve to incorporate recent bug-fixes aligned with…
The increasing development of LLMs in code generation has drawn significant attention among researchers. To enhance LLM-based code generation ability, current efforts are predominantly directed towards collecting high-quality datasets and…
Previous learning-based vulnerability detection methods relied on either medium-sized pre-trained models or smaller neural networks from scratch. Recent advancements in Large Pre-Trained Language Models (LLMs) have showcased remarkable…