English
Related papers

Related papers: Evaluating Agent-based Program Repair at Google

200 papers

Large language models (LLMs) and LLM-based Agents have been applied to fix bugs automatically, demonstrating the capability in addressing software defects by engaging in development environment interaction, iterative validation and code…

Software Engineering · Computer Science 2025-10-21 Xiangxin Meng , Zexiong Ma , Pengfei Gao , Chao Peng

LLM-based agent systems are emerging as a new software paradigm and have been widely adopted across diverse domains such as medicine, robotics, and programming. However, maintaining these systems requires substantial effort, as they are…

Artificial Intelligence · Computer Science 2025-10-27 Alfin Wijaya Rahardja , Junwei Liu , Weitong Chen , Zhenpeng Chen , Yiling Lou

Large language models (LLMs) are transforming automated program repair (APR) through agent-based approaches that localize bugs, generate patches, and verify fixes. However, the lack of high-quality, scalable training datasets, especially…

Software Engineering · Computer Science 2025-12-23 Minh V. T. Pham , Huy N. Phan , Hoang N. Phan , Cuong Le Chi , Tien N. Nguyen , Nghi D. Q. Bui

Performance bugs are inefficiencies in software that waste computational resources without causing functional failures, making them particularly challenging to detect and fix. While recent advances in Software Engineering agents have shown…

Software Engineering · Computer Science 2025-12-04 Spandan Garg , Roshanak Zilouchian Moghaddam , Neel Sundaresan

Large Language Models (LLMs) have shown impressive capabilities in downstream software engineering tasks such as Automated Program Repair (APR). In particular, there has been a lot of research on repository-level issue-resolution benchmarks…

Software Engineering · Computer Science 2025-06-23 Anvith Pabba , Alex Mathai , Anindya Chakraborty , Baishakhi Ray

AI coding agents demonstrate strong performance on general-purpose software benchmarks. However, their ability to handle 5G network engineering tasks remains unexplored. We propose SWE-Bench~5G, the first benchmark designed to investigate…

Networking and Internet Architecture · Computer Science 2026-04-30 Jiao Chen , Jianhua Tang , Xiaotong Yang , Zuohong Lv

Automated Program Repair (APR) agents leverage Large Language Models (LLMs) to autonomously diagnose and fix software bugs through reasoning, planning, and tool use. Despite impressive leaderboard gains on benchmarks such as SWE-bench,…

Software Engineering · Computer Science 2026-05-28 Ira Ceka , Hailie Mitchell , Saurabh Pujar , Luca Buratti , Shyam Ramji , Junfeng Yang , Gail Kaiser , Baishakhi Ray

Recent research builds various patching agents that combine large language models (LLMs) with non-ML tools and achieve promising results on the state-of-the-art (SOTA) software patching benchmark, SWE-bench. Based on how to determine the…

Robotics · Computer Science 2025-06-12 Hongwei Li , Yuheng Tang , Shiqi Wang , Wenbo Guo

Benchmarks for Software Engineering (SE) AI agents, most notably SWE-bench, have catalyzed progress in programming capabilities of AI agents. However, they overlook critical developer workflows such as Version Control System (VCS)…

Software Engineering · Computer Science 2025-05-29 Tobias Lindenbauer , Egor Bogomolov , Yaroslav Zharov

In recent years, AI-based software engineering has progressed from pre-trained models to advanced agentic workflows, with Software Development Agents representing the next major leap. These agents, capable of reasoning, planning, and…

Software Engineering · Computer Science 2024-12-30 Zhi Chen , Lingxiao Jiang

Bug reports often lack sufficient detail for developers to reproduce and fix the underlying defects. Bug Reproduction Tests (BRTs), tests that fail when the bug is present and pass when it has been resolved, are crucial for debugging, but…

Software Engineering · Computer Science 2025-03-12 Runxiang Cheng , Michele Tufano , Jürgen Cito , José Cambronero , Pat Rondon , Renyao Wei , Aaron Sun , Satish Chandra

The rapid progress in Automated Program Repair (APR) has been driven by advances in AI, particularly large language models (LLMs) and agent-based systems. SWE-Bench is a recent benchmark designed to evaluate LLM-based repair systems using…

Software Engineering · Computer Science 2026-02-06 Matias Martinez , Xavier Franch

Large Language Models (LLMs) have transformed software development and AI applications. While LLMs are designed for text processing, LLM agents extend this capability by enabling autonomous actions, tool use, and multi-step task completion.…

Software Engineering · Computer Science 2026-04-21 Niful Islam , Muhammad Anas Raza , Mohammad Wardat

We introduce SWE-Bench Pro, a substantially more challenging benchmark that builds upon the best practices of SWE-BENCH [25], but is explicitly designed to capture realistic, complex, enterprise-level problems beyond the scope of SWE-BENCH.…

Current benchmarks for evaluating software engineering agents, such as SWE-Bench Verified, are predominantly derived from GitHub issues and fail to accurately reflect how developers interact with chat-based coding assistants in integrated…

Software Engineering · Computer Science 2026-01-27 Spandan Garg , Benjamin Steenhoek , Yufan Huang

Coding agents powered by large language models are increasingly expected to perform realistic software maintenance tasks beyond isolated issue resolution. Existing benchmarks have shifted toward realistic software evolution, but they rarely…

Software Engineering · Computer Science 2026-05-15 Man Ho Lam , Chaozheng Wang , Hange Liu , Jingyu Xiao , Haau-sing Li , Jen-tse Huang , Terry Yue Zhuo , Michael R. Lyu

Automated issue solving seeks to autonomously identify and repair defective code snippets across an entire codebase. SWE-Bench has emerged as the most widely adopted benchmark for evaluating progress in this area. While LLM-based agentic…

Software Engineering · Computer Science 2025-09-18 Simiao Liu , Fang Liu , Liehao Li , Xin Tan , Yinghao Zhu , Xiaoli Lian , Li Zhang

We ask whether agentic AI systems built for software engineering transfer to realistic hardware engineering. Existing hardware LLM benchmarks isolate sub-tasks but none jointly requires repository navigation, hierarchy-aware localization,…

Hardware Architecture · Computer Science 2026-05-18 Qingyun Zou , Feng Yu , Hongshi Tan , Bingsheng He , WengFai Wong

AI-driven software development has rapidly advanced with the emergence of software development agents that leverage large language models (LLMs) to tackle complex, repository-level software engineering tasks. These agents go beyond just…

Software Engineering · Computer Science 2026-04-10 Zhi Chen , Wei Ma , Lingxiao Jiang

Automated program repair has traditionally focused on single-hunk defects, overlooking multi-hunk bugs that are prevalent in real-world systems. Repairing these bugs requires coordinated edits across multiple, disjoint code regions, posing…

Software Engineering · Computer Science 2025-11-17 Noor Nashid , Daniel Ding , Keheliya Gallaba , Ahmed E. Hassan , Ali Mesbah
‹ Prev 1 2 3 10 Next ›