Related papers: Evaluating Software Development Agents: Patch Patt…

Beyond Final Code: A Process-Oriented Error Analysis of Software Development Agents in Real-World GitHub Scenarios

AI-driven software development has rapidly advanced with the emergence of software development agents that leverage large language models (LLMs) to tackle complex, repository-level software engineering tasks. These agents go beyond just…

Software Engineering · Computer Science 2026-04-10 Zhi Chen , Wei Ma , Lingxiao Jiang

What Challenges Do Developers Face in AI Agent Systems? An Empirical Study on Stack Overflow & GitHub Issues

AI Agents have rapidly gained prominence in both research and industry as systems that extend large language models with planning, tool use, memory, and goal-directed action. Despite this progress, the development and maintenance of Agent…

Software Engineering · Computer Science 2026-01-27 Ali Asgari , Annibale Panichella , Pouria Derakhshanfar , Mitchell Olsthoorn

A Comprehensive Empirical Evaluation of Agent Frameworks on Code-centric Software Engineering Tasks

Unlike traditional automation tools or static LLM-based systems, agents combine decision-making and tool utilization to accomplish complex tasks, showing great potential in software engineering. However, existing studies largely focus on…

Software Engineering · Computer Science 2025-11-04 Zhuowen Yin , Cuifeng Gao , Chunsong Fan , Wenzhang Yang , Yinxing Xue , Lijun Zhang

AI builds, We Analyze: An Empirical Study of AI-Generated Build Code Quality

The rapid adoption of AI coding agents for software development has raised important questions about the quality and maintainability of the code they produce. While prior studies have examined AI-generated source code, the impact of AI…

Software Engineering · Computer Science 2026-01-26 Anwar Ghammam , Mohamed Almukhtar

AIDev: Studying AI Coding Agents on GitHub

AI coding agents are rapidly transforming software engineering by performing tasks such as feature development, debugging, and testing. Despite their growing impact, the research community lacks a comprehensive dataset capturing how these…

Software Engineering · Computer Science 2026-02-11 Hao Li , Haoxiang Zhang , Ahmed E. Hassan

Experimenting with Multi-Agent Software Development: Towards a Unified Platform

Large language models are redefining software engineering by implementing AI-powered techniques throughout the whole software development process, including requirement gathering, software architecture, code generation, testing, and…

Software Engineering · Computer Science 2024-06-11 Malik Abdul Sami , Muhammad Waseem , Zeeshan Rasheed , Mika Saari , Kari Systä , Pekka Abrahamsson

Agentic AI in the Software Development Lifecycle: Architecture, Empirical Evidence, and the Reshaping of Software Engineering

The arrival of large language models (LLMs) capable of multi-step reasoning, tool use, and long-horizon planning has produced a qualitative shift in software engineering. Where earlier code-completion tools such as GitHub Copilot operated…

Software Engineering · Computer Science 2026-04-30 Happy Bhati

AgentPack: A Dataset of Code Changes, Co-Authored by Agents and Humans

Fine-tuning large language models for code editing has typically relied on mining commits and pull requests. The working hypothesis has been that commit messages describe human intent in natural language, and patches to code describe the…

Software Engineering · Computer Science 2026-03-30 Yangtian Zi , Zixuan Wu , Aleksander Boruch-Gruszecki , Jonathan Bell , Arjun Guha

Professional Software Developers Don't Vibe, They Control: AI Agent Use for Coding in 2025

The rise of AI agents is transforming how software can be built. The promise of agents is that developers might write code quicker, delegate multiple tasks to different agents, and even write a full piece of software purely out of natural…

Software Engineering · Computer Science 2025-12-17 Ruanqianqian Huang , Avery Reyna , Sorin Lerner , Haijun Xia , Brian Hempel

What Makes a GitHub Issue Ready for Copilot?

AI-agents help developers in different coding tasks, such as developing new features, fixing bugs, and reviewing code. Developers can write a Github issue and assign it to an AI-agent like Copilot for implementation. Based on the issue and…

Software Engineering · Computer Science 2025-12-29 Mohammed Sayagh

Can Agents Fix Agent Issues?

LLM-based agent systems are emerging as a new software paradigm and have been widely adopted across diverse domains such as medicine, robotics, and programming. However, maintaining these systems requires substantial effort, as they are…

Artificial Intelligence · Computer Science 2025-10-27 Alfin Wijaya Rahardja , Junwei Liu , Weitong Chen , Zhenpeng Chen , Yiling Lou

GitGoodBench: A Novel Benchmark For Evaluating Agentic Performance On Git

Benchmarks for Software Engineering (SE) AI agents, most notably SWE-bench, have catalyzed progress in programming capabilities of AI agents. However, they overlook critical developer workflows such as Version Control System (VCS)…

Software Engineering · Computer Science 2025-05-29 Tobias Lindenbauer , Egor Bogomolov , Yaroslav Zharov

AgentPulse: A Continuous Multi-Signal Framework for Evaluating AI Agents in Deployment

Static benchmarks measure what AI agents can do at a fixed point in time but not how they are adopted, maintained, or experienced in deployment. We introduce AgentPulse, a continuous evaluation framework scoring 50 agents across 10 workload…

Artificial Intelligence · Computer Science 2026-04-28 Yuxuan Gao , Megan Wang , Yi Ling Yu

An Empirical Study of Agent Developer Practices in AI Agent Frameworks

The rise of large language models (LLMs) has sparked a surge of interest in agents, leading to the rapid growth of agent frameworks. Agent frameworks are software toolkits and libraries that provide standardized components, abstractions,…

Software Engineering · Computer Science 2025-12-02 Yanlin Wang , Xinyi Xu , Jiachi Chen , Tingting Bi , Wenchao Gu , Zibin Zheng

Agentic Much? Adoption of Coding Agents on GitHub

In the first half of 2025, coding agents have emerged as a category of development tools that have very quickly transitioned to the practice. Unlike ''traditional'' code completion LLMs such as Copilot, agents like Cursor, Claude Code, or…

Software Engineering · Computer Science 2026-04-09 Romain Robbes , Théo Matricon , Thomas Degueule , Andre Hora , Stefano Zacchiroli

How Safe Are AI-Generated Patches? A Large-scale Study on Security Risks in LLM and Agentic Automated Program Repair on SWE-bench

Large language models (LLMs) and their agentic frameworks are increasingly adopted to perform development tasks such as automated program repair (APR). While prior work has identified security risks in LLM-generated code, most have focused…

Cryptography and Security · Computer Science 2025-12-30 Amirali Sajadi , Kostadin Damevski , Preetha Chatterjee

Saving SWE-Bench: A Benchmark Mutation Approach for Realistic Agent Evaluation

Current benchmarks for evaluating software engineering agents, such as SWE-Bench Verified, are predominantly derived from GitHub issues and fail to accurately reflect how developers interact with chat-based coding assistants in integrated…

Software Engineering · Computer Science 2026-01-27 Spandan Garg , Benjamin Steenhoek , Yufan Huang

Why AI Agents Still Need You: Findings from Developer-Agent Collaborations in the Wild

Software Engineering Agents (SWE agents) can autonomously perform development tasks on benchmarks like SWE Bench, but still face challenges when tackling complex and ambiguous real-world tasks. Consequently, SWE agents are often designed to…

Software Engineering · Computer Science 2025-10-13 Aayush Kumar , Yasharth Bajpai , Sumit Gulwani , Gustavo Soares , Emerson Murphy-Hill

GitTaskBench: A Benchmark for Code Agents Solving Real-World Tasks Through Code Repository Leveraging

Beyond scratch coding, exploiting large-scale code repositories (e.g., GitHub) for practical tasks is vital in real-world software development, yet current benchmarks rarely evaluate code agents in such authentic, workflow-driven scenarios.…

Software Engineering · Computer Science 2025-09-16 Ziyi Ni , Huacan Wang , Shuo Zhang , Shuo Lu , Ziyang He , Wang You , Zhenheng Tang , Yuntao Du , Bill Sun , Hongzhang Liu , Sen Hu , Ronghao Chen , Bo Li , Xin Li , Chen Hu , Binxing Jiao , Daxin Jiang , Pin Lyu

AI IDEs or Autonomous Agents? Measuring the Impact of Coding Agents on Software Development

Large language model (LLM) based coding agents increasingly act as autonomous contributors that generate and merge pull requests, yet their real-world effects on software projects are unclear-especially compared with widely adopted…

Software Engineering · Computer Science 2026-01-28 Shyam Agarwal , Hao He , Bogdan Vasilescu