Related papers: Resolving Java Code Repository Issues with iSWE Ag…

SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution

Issue resolution has made remarkable progress thanks to the advanced reasoning capabilities of large language models (LLMs). Recently, agent-based frameworks such as SWE-agent have further advanced this progress by enabling autonomous,…

Software Engineering · Computer Science 2025-08-01 Han Li , Yuling Shi , Shaoxin Lin , Xiaodong Gu , Heng Lian , Xin Wang , Yantao Jia , Tao Huang , Qianxiang Wang

SWE-bench-java: A GitHub Issue Resolving Benchmark for Java

GitHub issue resolving is a critical task in software engineering, recently gaining significant attention in both industry and academia. Within this task, SWE-bench has been released to evaluate issue resolving capabilities of large…

Software Engineering · Computer Science 2024-08-27 Daoguang Zan , Zhirong Huang , Ailun Yu , Shaoxin Lin , Yifan Shi , Wei Liu , Dong Chen , Zongshuai Qi , Hao Yu , Lei Yu , Dezhi Ran , Muhan Zeng , Bo Shen , Pan Bian , Guangtai Liang , Bei Guan , Pengjie Huang , Tao Xie , Yongji Wang , Qianxiang Wang

Why AI Agents Still Need You: Findings from Developer-Agent Collaborations in the Wild

Software Engineering Agents (SWE agents) can autonomously perform development tasks on benchmarks like SWE Bench, but still face challenges when tackling complex and ambiguous real-world tasks. Consequently, SWE agents are often designed to…

Software Engineering · Computer Science 2025-10-13 Aayush Kumar , Yasharth Bajpai , Sumit Gulwani , Gustavo Soares , Emerson Murphy-Hill

SWE-Adept: An LLM-Based Agentic Framework for Deep Codebase Analysis and Structured Issue Resolution

Large language models (LLMs) exhibit strong performance on self-contained programming tasks. However, they still struggle with repository-level software engineering (SWE), which demands (1) deep codebase navigation with effective context…

Software Engineering · Computer Science 2026-05-27 Kang He , Kaushik Roy

Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving

The task of issue resolving is to modify a codebase to generate a patch that addresses a given issue. However, existing benchmarks, such as SWE-bench, focus almost exclusively on Python, making them insufficient for evaluating Large…

Software Engineering · Computer Science 2025-04-04 Daoguang Zan , Zhirong Huang , Wei Liu , Hanwu Chen , Linhao Zhang , Shulin Xin , Lu Chen , Qi Liu , Xiaojian Zhong , Aoyan Li , Siyao Liu , Yongsheng Xiao , Liangqiang Chen , Yuyu Zhang , Jing Su , Tianyu Liu , Rui Long , Kai Shen , Liang Xiang

Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey

Issue resolution, a complex Software Engineering (SWE) task integral to real-world development, has emerged as a compelling challenge for artificial intelligence. The establishment of benchmarks like SWE-bench revealed this task as…

Software Engineering · Computer Science 2026-01-21 Caihua Li , Lianghong Guo , Yanlin Wang , Daya Guo , Wei Tao , Zhenyu Shan , Mingwei Liu , Jiachi Chen , Haoyu Song , Duyu Tang , Hongyu Zhang , Zibin Zheng

InfCode: Adversarial Iterative Refinement of Tests and Patches for Reliable Software Issue Resolution

Large language models have advanced software engineering automation, yet resolving real-world software issues remains difficult because it requires repository-level reasoning, accurate diagnostics, and strong verification signals. Existing…

Software Engineering · Computer Science 2025-11-21 KeFan Li , Mengfei Wang , Hengzhi Zhang , Zhichao Li , Yuan Yuan , Mu Li , Xiang Gao , Hailong Sun , Chunming Hu , Weifeng Lv

SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads?

Optimizing the performance of large-scale software repositories demands expertise in code reasoning and software engineering (SWE) to reduce runtime while preserving program correctness. However, most benchmarks emphasize what to fix rather…

Software Engineering · Computer Science 2025-11-12 Jeffrey Jian Ma , Milad Hashemi , Amir Yazdanbakhsh , Kevin Swersky , Ofir Press , Enhui Li , Vijay Janapa Reddi , Parthasarathy Ranganathan

SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents

Coding agents powered by large language models have shown impressive capabilities in software engineering tasks, but evaluating their performance across diverse programming languages and real-world scenarios remains challenging. We…

Software Engineering · Computer Science 2025-04-25 Muhammad Shihab Rashid , Christian Bock , Yuan Zhuang , Alexander Buchholz , Tim Esler , Simon Valentin , Luca Franceschi , Martin Wistuba , Prabhu Teja Sivaprasad , Woo Jung Kim , Anoop Deoras , Giovanni Zappella , Laurent Callot

SWE-Edit: Rethinking Code Editing for Efficient SWE-Agent

Large language model agents have made strong progress on software engineering, yet current systems suffer from a context coupling problem: the standard code editing interface conflates code inspection, modification planning, and edit…

Software Engineering · Computer Science 2026-05-27 Yikai Zhang , Jiaxin Pei , Kenan Li , Qirui Jin , Maoquan Wang , Jin Pan , Yu Kang , Shengyu Fu , Elsie Nallipogu , Junjie Hu , Yufan Huang , Zijian Jin

Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository Exploration

This paper presents Alibaba LingmaAgent, a novel Automated Software Engineering method designed to comprehensively understand and utilize whole software repositories for issue resolution. Deployed in TONGYI Lingma, an IDE-based coding…

Software Engineering · Computer Science 2025-03-27 Yingwei Ma , Qingping Yang , Rongyu Cao , Binhua Li , Fei Huang , Yongbin Li

BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing?

Current code-agent benchmarks primarily evaluate localized issue resolution within a single target repository, leaving under-tested many software engineering tasks that require external knowledge or broader repository-level changes. We…

Computation and Language · Computer Science 2026-05-27 Guoxin Chen , Fanzhe Meng , Jiale Zhao , Minghao Li , Daixuan Cheng , Huatong Song , Jie Chen , Yuzhi Lin , Hui Chen , Xin Zhao , Ruihua Song , Chang Liu , Cheng Chen , Kai Jia , Ji-Rong Wen

Beyond Final Code: A Process-Oriented Error Analysis of Software Development Agents in Real-World GitHub Scenarios

AI-driven software development has rapidly advanced with the emergence of software development agents that leverage large language models (LLMs) to tackle complex, repository-level software engineering tasks. These agents go beyond just…

Software Engineering · Computer Science 2026-04-10 Zhi Chen , Wei Ma , Lingxiao Jiang

Can Agents Fix Agent Issues?

LLM-based agent systems are emerging as a new software paradigm and have been widely adopted across diverse domains such as medicine, robotics, and programming. However, maintaining these systems requires substantial effort, as they are…

Artificial Intelligence · Computer Science 2025-10-27 Alfin Wijaya Rahardja , Junwei Liu , Weitong Chen , Zhenpeng Chen , Yiling Lou

SWE-Dev: Building Software Engineering Agents with Training and Inference Scaling

Large language models (LLMs) have advanced rapidly from conversational problem solving to addressing real-world tasks involving tool use, such as software engineering (SWE). Recent LLM-powered toolkits, such as OpenAI Codex and Cursor, have…

Artificial Intelligence · Computer Science 2025-06-24 Haoran Wang , Zhenyu Hou , Yao Wei , Jie Tang , Yuxiao Dong

Middleware-based multi-agent development environment for building and testing distributed intelligent systems

The spread of the Internet of Things (IoT) is demanding new, powerful architectures for handling the huge amounts of data produced by the IoT devices. In many scenarios, many existing isolated solutions applied to IoT devices use a set of…

Multiagent Systems · Computer Science 2024-02-19 Francisco José Aguayo-Canela , Héctor Alaiz-Moretón , María Teresa García-Ordás , José Alberto Benítez-Andrades , Carmen Benavides , Paulo Novais , Isaías García-Rodríguez

SWE-Sharp-Bench: A Reproducible Benchmark for C# Software Engineering Tasks

AI coding agents have shown great progress on Python software engineering benchmarks like SWE-Bench, and for other languages like Java and C in benchmarks like Multi-SWE-Bench. However, C# -- a prominent enterprise language ranking #5 in…

Software Engineering · Computer Science 2025-11-19 Sanket Mhatre , Yasharth Bajpai , Sumit Gulwani , Emerson Murphy-Hill , Gustavo Soares

CodeR: Issue Resolving with Multi-Agent and Task Graphs

GitHub issue resolving recently has attracted significant attention from academia and industry. SWE-bench is proposed to measure the performance in resolving issues. In this paper, we propose CodeR, which adopts a multi-agent framework and…

Computation and Language · Computer Science 2024-06-12 Dong Chen , Shaoxin Lin , Muhan Zeng , Daoguang Zan , Jian-Gang Wang , Anton Cheshkov , Jun Sun , Hao Yu , Guoliang Dong , Artem Aliev , Jie Wang , Xiao Cheng , Guangtai Liang , Yuchi Ma , Pan Bian , Tao Xie , Qianxiang Wang

Beyond Fixed Tests: Repository-Level Issue Resolution as Coevolution of Code and Behavioral Constraints

Software engineers resolving repository-level issues do not treat existing tests as immutable correctness oracles. Instead, they iteratively refine both code and the tests used to characterize intended behavior, as new modifications expose…

Software Engineering · Computer Science 2026-04-07 Kefan Li , Yuan Yuan , Mengfei Wang , Shihao Zheng , Wei Wang , Ping Yang , Mu Li , Weifeng Lv

An Empirical Study on Failures in Automated Issue Solving

Automated issue solving seeks to autonomously identify and repair defective code snippets across an entire codebase. SWE-Bench has emerged as the most widely adopted benchmark for evaluating progress in this area. While LLM-based agentic…

Software Engineering · Computer Science 2025-09-18 Simiao Liu , Fang Liu , Liehao Li , Xin Tan , Yinghao Zhu , Xiaoli Lian , Li Zhang