English
Related papers

Related papers: SWT-Bench: Testing and Validating Real-World Bug-F…

200 papers

Large language models (LLMs) and LLM-based Agents have been applied to fix bugs automatically, demonstrating the capability in addressing software defects by engaging in development environment interaction, iterative validation and code…

Software Engineering · Computer Science 2025-10-21 Xiangxin Meng , Zexiong Ma , Pengfei Gao , Chao Peng

Large Language Model (LLM) code agents increasingly resolve repository-level issues by iteratively editing code, invoking tools, and validating candidate patches. In these workflows, agents often write tests on the fly, but the value of…

Software Engineering · Computer Science 2026-04-10 Zhi Chen , Zhensu Sun , Yuling Shi , Chao Peng , Xiaodong Gu , David Lo , Lingxiao Jiang

The advent of Large Language Models (LLMs) has spurred the development of coding agents for real-world code generation. As a widely used benchmark for evaluating the code generation capabilities of these agents, SWE-Bench uses real-world…

Software Engineering · Computer Science 2025-06-12 Boxi Yu , Yuxuan Zhu , Pinjia He , Daniel Kang

LLM-based agent systems are emerging as a new software paradigm and have been widely adopted across diverse domains such as medicine, robotics, and programming. However, maintaining these systems requires substantial effort, as they are…

Artificial Intelligence · Computer Science 2025-10-27 Alfin Wijaya Rahardja , Junwei Liu , Weitong Chen , Zhenpeng Chen , Yiling Lou

Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, capable of tackling complex tasks during inference. However, the extent to which LLMs can be utilized for code checking or debugging through test…

Software testing is a core discipline in software engineering where a large array of research results has been produced, notably in the area of automatic test generation. Because existing approaches produce test cases that either can be…

Software Engineering · Computer Science 2023-10-11 Laura Plein , Wendkûuni C. Ouédraogo , Jacques Klein , Tegawendé F. Bissyandé

Large language models (LLMs) are transforming automated program repair (APR) through agent-based approaches that localize bugs, generate patches, and verify fixes. However, the lack of high-quality, scalable training datasets, especially…

Software Engineering · Computer Science 2025-12-23 Minh V. T. Pham , Huy N. Phan , Hoang N. Phan , Cuong Le Chi , Tien N. Nguyen , Nghi D. Q. Bui

The issue-resolving task, where a model generates patches to fix real-world bugs, has emerged as a critical benchmark for evaluating the capabilities of large language models (LLMs). While SWE-bench and its variants have become standard in…

Large language models (LLMs) have recently achieved notable success in code-generation benchmarks such as HumanEval and LiveCodeBench. However, a detailed examination reveals that these evaluation suites often comprise only a limited number…

Computation and Language · Computer Science 2025-07-11 Zihan Ma , Taolin Zhang , Maosong Cao , Junnan Liu , Wenwei Zhang , Minnan Luo , Songyang Zhang , Kai Chen

Recent advancements in Large Language Models (LLMs) have spurred interest in deploying LLM agents to undertake tasks in the world. LLMs are often deployed in agent systems: code that orchestrates LLM calls and provides them with tools. We…

Artificial Intelligence · Computer Science 2025-05-20 Maxime Robeyns , Martin Szummer , Laurence Aitchison

Software testing is crucial for ensuring the correctness and reliability of software systems. Automated generation of issue reproduction tests from natural language issue descriptions enhances developer productivity by simplifying root…

Software Engineering · Computer Science 2026-01-21 Aditya Bharat Soni , Rajat Ghosh , Vaishnavi Bhargava , Valerie Chen , Debojyoti Dutta

Large Language Models (LLMs) are one of the most promising developments in the field of artificial intelligence, and the software engineering community has readily noticed their potential role in the software development life-cycle.…

Software Engineering · Computer Science 2026-03-16 Greta Dolcetti , Vincenzo Arceri , Eleonora Iotti , Sergio Maffeis , Agostino Cortesi , Enea Zaffanella

Recent advances in large language models (LLMs) have shown significant potential to automate various software development tasks, including code completion, test generation, and bug fixing. However, the application of LLMs for automated bug…

Software Engineering · Computer Science 2024-09-05 Yizhou Liu , Pengfei Gao , Xinchen Wang , Jie Liu , Yexuan Shi , Zhao Zhang , Chao Peng

Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, but their proficiency in producing secure code remains a critical, under-explored area. Existing benchmarks often fall short by relying on synthetic…

Cryptography and Security · Computer Science 2026-02-02 Yanlin Wang , Ziyao Zhang , Chong Wang , Xinyi Xu , Mingwei Liu , Yong Wang , Jiachi Chen , Zibin Zheng

In light of the rapid adoption of AI coding assistants, LLM-assisted development has become increasingly prevalent, creating an urgent need for robust evaluation of generated code quality. Existing benchmarks often require extensive manual…

Software Engineering · Computer Science 2025-05-21 Yuancheng Jiang , Roland Yap , Zhenkai Liang

In this paper, we present a novel approach to improving software quality and efficiency through a Large Language Model (LLM)-based model designed to review code and identify potential issues. Our proposed LLM-based AI agent model is trained…

Recent advancements in large language models (LLMs) have significantly advanced the automation of software development tasks, including code synthesis, program repair, and test generation. More recently, researchers and industry…

Software Engineering · Computer Science 2024-10-30 Chunqiu Steven Xia , Yinlin Deng , Soren Dunn , Lingming Zhang

Writing code requires significant time and effort in software development. To automate this process, researchers have made substantial progress using Large Language Models (LLMs) for code generation. Many benchmarks like HumanEval and…

Software Engineering · Computer Science 2026-04-27 Jia Li , Hongyi Deng , Yiran Zhang , Kechi Zhang , Tianqi Shao , Tiankuo Zhao , Weinan Wang , Zhi Jin , Ge Li , Yang Liu , Yingtao Fang , Yihong Dong

The code generation capabilities of large language models(LLMs) have emerged as a critical dimension in evaluating their overall performance. However, prior research has largely overlooked the security risks inherent in the generated code.…

Cryptography and Security · Computer Science 2025-06-23 Xinghang Li , Jingzhe Ding , Chao Peng , Bing Zhao , Xiang Gao , Hongwan Gao , Xinchen Gu

Large language models (LLMs) exhibit strong performance on self-contained programming tasks. However, they still struggle with repository-level software engineering (SWE), which demands (1) deep codebase navigation with effective context…

Software Engineering · Computer Science 2026-05-27 Kang He , Kaushik Roy
‹ Prev 1 2 3 10 Next ›