English
Related papers

Related papers: SWE-Edit: Rethinking Code Editing for Efficient SW…

200 papers

Large language models (LLMs) exhibit strong performance on self-contained programming tasks. However, they still struggle with repository-level software engineering (SWE), which demands (1) deep codebase navigation with effective context…

Software Engineering · Computer Science 2026-05-27 Kang He , Kaushik Roy

LLM agents have demonstrated remarkable capabilities in software development, but their performance is hampered by long interaction contexts, which incur high API costs and latency. While various context compression approaches such as…

Software Engineering · Computer Science 2026-05-08 Yuhang Wang , Yuling Shi , Mo Yang , Rongrui Zhang , Shilin He , Heng Lian , Yuting Chen , Siyu Ye , Kai Cai , Xiaodong Gu

Large language models are increasingly used as coding agents for software engineering tasks. Current benchmarks mainly evaluate whether the agent can correctly solve the request or fix the bugs. They largely treat tasks as independent and…

Software Engineering · Computer Science 2026-05-07 Jiayuan Zhu , Junde Wu , Minhao Hu , Shengda Zhu , Jiazhen Pan , Weixiang Shen , Yijun Yang , Fenglin Liu , Jianye Hao , Yueming Jin , Qirong Ho , Min Xu

Issue resolution has made remarkable progress thanks to the advanced reasoning capabilities of large language models (LLMs). Recently, agent-based frameworks such as SWE-agent have further advanced this progress by enabling autonomous,…

Software Engineering · Computer Science 2025-08-01 Han Li , Yuling Shi , Shaoxin Lin , Xiaodong Gu , Heng Lian , Xin Wang , Yantao Jia , Tao Huang , Qianxiang Wang

Prior representative ReAct-style approaches in autonomous Software Engineering (SWE) typically lack the explicit System-2 reasoning required for deep analysis and handling complex edge cases. While recent reasoning models demonstrate the…

Artificial Intelligence · Computer Science 2026-04-14 Shuquan Lian , Juncheng Liu , Yazhe Chen , Yuhong Chen , Hui Li

Large language models (LLMs) have advanced rapidly from conversational problem solving to addressing real-world tasks involving tool use, such as software engineering (SWE). Recent LLM-powered toolkits, such as OpenAI Codex and Cursor, have…

Artificial Intelligence · Computer Science 2025-06-24 Haoran Wang , Zhenyu Hou , Yao Wei , Jie Tang , Yuxiao Dong

Optimizing the performance of large-scale software repositories demands expertise in code reasoning and software engineering (SWE) to reduce runtime while preserving program correctness. However, most benchmarks emphasize what to fix rather…

Coding agents powered by large language models are increasingly expected to perform realistic software maintenance tasks beyond isolated issue resolution. Existing benchmarks have shifted toward realistic software evolution, but they rarely…

Software Engineering · Computer Science 2026-05-15 Man Ho Lam , Chaozheng Wang , Hange Liu , Jingyu Xiao , Haau-sing Li , Jen-tse Huang , Terry Yue Zhuo , Michael R. Lyu

Instructed code editing is a significant challenge for large language models (LLMs). On the EditBench benchmark, 39 of 40 evaluated models obtain a task success rate (TSR) below 60 percent, highlighting a gap between general code generation…

Software Engineering · Computer Science 2026-04-29 Noam Tarshish , Nofar Selouk , Daniel Hodisan , Bar Ezra Gafniel , Yuval Elovici , Asaf Shabtai , Eliya Nachmani

Small language models (SLMs) offer compelling advantages in cost, latency, and adaptability, but have so far lagged behind larger models on long-horizon software engineering tasks such as SWE-bench, where they suffer from pervasive action…

Software Engineering · Computer Science 2026-02-26 Patrick Tser Jern Kon , Archana Pradeep , Ang Chen , Alexander P. Ellis , Warren Hunt , Zijian Wang , John Yang , Samuel Thompson

We introduce SWE Atlas, a benchmark suite for coding agents spanning three professional software engineering workflows: Codebase Q&A (124 tasks), Test Writing (90 tasks), and Refactoring (70 tasks). SWE Atlas differs from prior SWE…

Recent advances in large language models (LLMs) have enabled software engineering agents to tackle complex code modification tasks. Most existing approaches rely on execution feedback from containerized environments, which require…

Language model (LM) agents are increasingly being used to automate complicated tasks in digital environments. Just as humans benefit from powerful software applications, such as integrated development environments, for complex tasks like…

Software Engineering · Computer Science 2024-11-13 John Yang , Carlos E. Jimenez , Alexander Wettig , Kilian Lieret , Shunyu Yao , Karthik Narasimhan , Ofir Press

In this technical report, we present SWE-Master, an open-source and fully reproducible post-training framework for building effective software engineering agents. SWE-Master systematically explores the complete agent development pipeline,…

LLM-based agents have shown promising capabilities in a growing range of software engineering (SWE) tasks. However, advancing this field faces two critical challenges. First, high-quality training data is scarce, especially data that…

Existing benchmarks for AI coding agents focus on isolated, single-issue tasks such as fixing a bug or adding a small feature. However, real-world software engineering is a long-horizon endeavor: developers interpret high-level…

Software Engineering · Computer Science 2026-05-25 Tue Le , Minh V. T. Thai , Dung Nguyen Manh , Huy Phan Nhat , Nghi D. Q. Bui

Test-time scaling has been widely adopted to enhance the capabilities of Large Language Model (LLM) agents in software engineering (SWE) tasks. However, the standard approach of repeatedly sampling trajectories from scratch is…

Software Engineering · Computer Science 2026-02-06 Yifeng Ding , Lingming Zhang

We introduce SWE-PRBench, a benchmark of 350 pull requests with human-annotated ground truth for evaluating AI code review quality. Evaluated against an LLM-as-judge framework validated at kappa=0.75, 8 frontier models detect only 15-31% of…

Software Engineering · Computer Science 2026-03-30 Deepak Kumar

Agentic repository-level code understanding is essential for automating complex software engineering tasks, yet the field lacks reliable benchmarks. Existing evaluations often overlook the long tail topics and rely on popular repositories…

Software Engineering Agents (SWE agents) can autonomously perform development tasks on benchmarks like SWE Bench, but still face challenges when tackling complex and ambiguous real-world tasks. Consequently, SWE agents are often designed to…

Software Engineering · Computer Science 2025-10-13 Aayush Kumar , Yasharth Bajpai , Sumit Gulwani , Gustavo Soares , Emerson Murphy-Hill
‹ Prev 1 2 3 10 Next ›