Related papers: SWE-Edit: Rethinking Code Editing for Efficient SW…

SWE-Adept: An LLM-Based Agentic Framework for Deep Codebase Analysis and Structured Issue Resolution

Large language models (LLMs) exhibit strong performance on self-contained programming tasks. However, they still struggle with repository-level software engineering (SWE), which demands (1) deep codebase navigation with effective context…

Software Engineering · Computer Science 2026-05-27 Kang He , Kaushik Roy

SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents

LLM agents have demonstrated remarkable capabilities in software development, but their performance is hampered by long interaction contexts, which incur high API costs and latency. While various context compression approaches such as…

Software Engineering · Computer Science 2026-05-08 Yuhang Wang , Yuling Shi , Mo Yang , Rongrui Zhang , Shilin He , Heng Lian , Yuting Chen , Siyu Ye , Kai Cai , Xiaodong Gu

SWE Context Bench: A Benchmark for Context Learning in Coding

Large language models are increasingly used as coding agents for software engineering tasks. Current benchmarks mainly evaluate whether the agent can correctly solve the request or fix the bugs. They largely treat tasks as independent and…

Software Engineering · Computer Science 2026-05-07 Jiayuan Zhu , Junde Wu , Minhao Hu , Shengda Zhu , Jiazhen Pan , Weixiang Shen , Yijun Yang , Fenglin Liu , Jianye Hao , Yueming Jin , Qirong Ho , Min Xu

SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution

Issue resolution has made remarkable progress thanks to the advanced reasoning capabilities of large language models (LLMs). Recently, agent-based frameworks such as SWE-agent have further advanced this progress by enabling autonomous,…

Software Engineering · Computer Science 2025-08-01 Han Li , Yuling Shi , Shaoxin Lin , Xiaodong Gu , Heng Lian , Xin Wang , Yantao Jia , Tao Huang , Qianxiang Wang

SWE-AGILE: A Software Agent Framework for Efficiently Managing Dynamic Reasoning Context

Prior representative ReAct-style approaches in autonomous Software Engineering (SWE) typically lack the explicit System-2 reasoning required for deep analysis and handling complex edge cases. While recent reasoning models demonstrate the…

Artificial Intelligence · Computer Science 2026-04-14 Shuquan Lian , Juncheng Liu , Yazhe Chen , Yuhong Chen , Hui Li

SWE-Dev: Building Software Engineering Agents with Training and Inference Scaling

Large language models (LLMs) have advanced rapidly from conversational problem solving to addressing real-world tasks involving tool use, such as software engineering (SWE). Recent LLM-powered toolkits, such as OpenAI Codex and Cursor, have…

Artificial Intelligence · Computer Science 2025-06-24 Haoran Wang , Zhenyu Hou , Yao Wei , Jie Tang , Yuxiao Dong

SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads?

Optimizing the performance of large-scale software repositories demands expertise in code reasoning and software engineering (SWE) to reduce runtime while preserving program correctness. However, most benchmarks emphasize what to fix rather…

Software Engineering · Computer Science 2025-11-12 Jeffrey Jian Ma , Milad Hashemi , Amir Yazdanbakhsh , Kevin Swersky , Ofir Press , Enhui Li , Vijay Janapa Reddi , Parthasarathy Ranganathan

SWE-Chain: Benchmarking Coding Agents on Chained Release-Level Package Upgrades

Coding agents powered by large language models are increasingly expected to perform realistic software maintenance tasks beyond isolated issue resolution. Existing benchmarks have shifted toward realistic software evolution, but they rarely…

Software Engineering · Computer Science 2026-05-15 Man Ho Lam , Chaozheng Wang , Hange Liu , Jingyu Xiao , Haau-sing Li , Jen-tse Huang , Terry Yue Zhuo , Michael R. Lyu

SAFEdit: Does Multi-Agent Decomposition Resolve the Reliability Challenges of Instructed Code Editing?

Instructed code editing is a significant challenge for large language models (LLMs). On the EditBench benchmark, 39 of 40 evaluated models obtain a task success rate (TSR) below 60 percent, highlighting a gap between general code generation…

Software Engineering · Computer Science 2026-04-29 Noam Tarshish , Nofar Selouk , Daniel Hodisan , Bar Ezra Gafniel , Yuval Elovici , Asaf Shabtai , Eliya Nachmani

SWE-Prot\'eg\'e: Learning to Selectively Collaborate With an Expert Unlocks Small Language Models as Software Engineering Agents

Small language models (SLMs) offer compelling advantages in cost, latency, and adaptability, but have so far lagged behind larger models on long-horizon software engineering tasks such as SWE-bench, where they suffer from pervasive action…

Software Engineering · Computer Science 2026-02-26 Patrick Tser Jern Kon , Archana Pradeep , Ang Chen , Alexander P. Ellis , Warren Hunt , Zijian Wang , John Yang , Samuel Thompson

SWE Atlas: Benchmarking Coding Agents Beyond Issue Resolution

We introduce SWE Atlas, a benchmark suite for coding agents spanning three professional software engineering workflows: Codebase Q&A (124 tasks), Test Writing (90 tasks), and Refactoring (70 tasks). SWE Atlas differs from prior SWE…

Machine Learning · Computer Science 2026-05-12 Mohit Raghavendra , Soham Dan , Miguel Romero Calvo , Yannis Yiming He , Johannes Baptist Mols , Gautam Anand , Cole McCollum , Edgar Arakelyan , Vijay Bharadwaj , Andrew Park , Jeff Da , MohammadHossein Rezaei , Bing Liu , Brad Kenstler , Yunzhong He

SWE-World: Building Software Engineering Agents in Docker-Free Environments

Recent advances in large language models (LLMs) have enabled software engineering agents to tackle complex code modification tasks. Most existing approaches rely on execution feedback from containerized environments, which require…

Software Engineering · Computer Science 2026-02-04 Shuang Sun , Huatong Song , Lisheng Huang , Jinhao Jiang , Ran Le , Zhihao Lv , Zongchao Chen , Yiwen Hu , Wenyang Luo , Wayne Xin Zhao , Yang Song , Hongteng Xu , Tao Zhang , Ji-Rong Wen

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

Language model (LM) agents are increasingly being used to automate complicated tasks in digital environments. Just as humans benefit from powerful software applications, such as integrated development environments, for complex tasks like…

Software Engineering · Computer Science 2024-11-13 John Yang , Carlos E. Jimenez , Alexander Wettig , Kilian Lieret , Shunyu Yao , Karthik Narasimhan , Ofir Press

SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training

In this technical report, we present SWE-Master, an open-source and fully reproducible post-training framework for building effective software engineering agents. SWE-Master systematically explores the complete agent development pipeline,…

Software Engineering · Computer Science 2026-02-25 Huatong Song , Lisheng Huang , Shuang Sun , Jinhao Jiang , Ran Le , Daixuan Cheng , Guoxin Chen , Yiwen Hu , Zongchao Chen , Yiming Jia , Wayne Xin Zhao , Yang Song , Tao Zhang , Ji-Rong Wen

SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents

LLM-based agents have shown promising capabilities in a growing range of software engineering (SWE) tasks. However, advancing this field faces two critical challenges. First, high-quality training data is scarce, especially data that…

Software Engineering · Computer Science 2025-11-05 Ibragim Badertdinov , Alexander Golubev , Maksim Nekrashevich , Anton Shevtsov , Simon Karasik , Andrei Andriushchenko , Maria Trofimova , Daria Litvintseva , Boris Yangel

SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios

Existing benchmarks for AI coding agents focus on isolated, single-issue tasks such as fixing a bug or adding a small feature. However, real-world software engineering is a long-horizon endeavor: developers interpret high-level…

Software Engineering · Computer Science 2026-05-25 Tue Le , Minh V. T. Thai , Dung Nguyen Manh , Huy Phan Nhat , Nghi D. Q. Bui

SWE-Replay: Efficient Test-Time Scaling for Software Engineering Agents

Test-time scaling has been widely adopted to enhance the capabilities of Large Language Model (LLM) agents in software engineering (SWE) tasks. However, the standard approach of repeatedly sampling trajectories from scratch is…

Software Engineering · Computer Science 2026-02-06 Yifeng Ding , Lingming Zhang

SWE-PRBench: Benchmarking AI Code Review Quality Against Pull Request Feedback

We introduce SWE-PRBench, a benchmark of 350 pull requests with human-annotated ground truth for evaluating AI code review quality. Evaluated against an LLM-as-judge framework validated at kappa=0.75, 8 frontier models detect only 15-31% of…

Software Engineering · Computer Science 2026-03-30 Deepak Kumar

SWE-QA-Pro: A Representative Benchmark and Scalable Training Recipe for Repository-Level Code Understanding

Agentic repository-level code understanding is essential for automating complex software engineering tasks, yet the field lacks reliable benchmarks. Existing evaluations often overlook the long tail topics and rely on popular repositories…

Software Engineering · Computer Science 2026-03-18 Songcheng Cai , Zhiheng Lyu , Yuansheng Ni , Xiangchao Chen , Baichuan Zhou , Shenzhe Zhu , Yi Lu , Haozhe Wang , Chi Ruan , Benjamin Schneider , Weixu Zhang , Xiang Li , Andy Zheng , Yuyu Zhang , Ping Nie , Wenhu Chen

Why AI Agents Still Need You: Findings from Developer-Agent Collaborations in the Wild

Software Engineering Agents (SWE agents) can autonomously perform development tasks on benchmarks like SWE Bench, but still face challenges when tackling complex and ambiguous real-world tasks. Consequently, SWE agents are often designed to…

Software Engineering · Computer Science 2025-10-13 Aayush Kumar , Yasharth Bajpai , Sumit Gulwani , Gustavo Soares , Emerson Murphy-Hill