English
Related papers

Related papers: Evaluating Software Development Agents: Patch Patt…

200 papers

AI-driven software development has rapidly advanced with the emergence of software development agents that leverage large language models (LLMs) to tackle complex, repository-level software engineering tasks. These agents go beyond just…

Software Engineering · Computer Science 2026-04-10 Zhi Chen , Wei Ma , Lingxiao Jiang

AI Agents have rapidly gained prominence in both research and industry as systems that extend large language models with planning, tool use, memory, and goal-directed action. Despite this progress, the development and maintenance of Agent…

Software Engineering · Computer Science 2026-01-27 Ali Asgari , Annibale Panichella , Pouria Derakhshanfar , Mitchell Olsthoorn

Unlike traditional automation tools or static LLM-based systems, agents combine decision-making and tool utilization to accomplish complex tasks, showing great potential in software engineering. However, existing studies largely focus on…

Software Engineering · Computer Science 2025-11-04 Zhuowen Yin , Cuifeng Gao , Chunsong Fan , Wenzhang Yang , Yinxing Xue , Lijun Zhang

The rapid adoption of AI coding agents for software development has raised important questions about the quality and maintainability of the code they produce. While prior studies have examined AI-generated source code, the impact of AI…

Software Engineering · Computer Science 2026-01-26 Anwar Ghammam , Mohamed Almukhtar

AI coding agents are rapidly transforming software engineering by performing tasks such as feature development, debugging, and testing. Despite their growing impact, the research community lacks a comprehensive dataset capturing how these…

Software Engineering · Computer Science 2026-02-11 Hao Li , Haoxiang Zhang , Ahmed E. Hassan

Large language models are redefining software engineering by implementing AI-powered techniques throughout the whole software development process, including requirement gathering, software architecture, code generation, testing, and…

Software Engineering · Computer Science 2024-06-11 Malik Abdul Sami , Muhammad Waseem , Zeeshan Rasheed , Mika Saari , Kari Systä , Pekka Abrahamsson

The arrival of large language models (LLMs) capable of multi-step reasoning, tool use, and long-horizon planning has produced a qualitative shift in software engineering. Where earlier code-completion tools such as GitHub Copilot operated…

Software Engineering · Computer Science 2026-04-30 Happy Bhati

Fine-tuning large language models for code editing has typically relied on mining commits and pull requests. The working hypothesis has been that commit messages describe human intent in natural language, and patches to code describe the…

Software Engineering · Computer Science 2026-03-30 Yangtian Zi , Zixuan Wu , Aleksander Boruch-Gruszecki , Jonathan Bell , Arjun Guha

The rise of AI agents is transforming how software can be built. The promise of agents is that developers might write code quicker, delegate multiple tasks to different agents, and even write a full piece of software purely out of natural…

Software Engineering · Computer Science 2025-12-17 Ruanqianqian Huang , Avery Reyna , Sorin Lerner , Haijun Xia , Brian Hempel

AI-agents help developers in different coding tasks, such as developing new features, fixing bugs, and reviewing code. Developers can write a Github issue and assign it to an AI-agent like Copilot for implementation. Based on the issue and…

Software Engineering · Computer Science 2025-12-29 Mohammed Sayagh

LLM-based agent systems are emerging as a new software paradigm and have been widely adopted across diverse domains such as medicine, robotics, and programming. However, maintaining these systems requires substantial effort, as they are…

Artificial Intelligence · Computer Science 2025-10-27 Alfin Wijaya Rahardja , Junwei Liu , Weitong Chen , Zhenpeng Chen , Yiling Lou

Benchmarks for Software Engineering (SE) AI agents, most notably SWE-bench, have catalyzed progress in programming capabilities of AI agents. However, they overlook critical developer workflows such as Version Control System (VCS)…

Software Engineering · Computer Science 2025-05-29 Tobias Lindenbauer , Egor Bogomolov , Yaroslav Zharov

Static benchmarks measure what AI agents can do at a fixed point in time but not how they are adopted, maintained, or experienced in deployment. We introduce AgentPulse, a continuous evaluation framework scoring 50 agents across 10 workload…

Artificial Intelligence · Computer Science 2026-04-28 Yuxuan Gao , Megan Wang , Yi Ling Yu

The rise of large language models (LLMs) has sparked a surge of interest in agents, leading to the rapid growth of agent frameworks. Agent frameworks are software toolkits and libraries that provide standardized components, abstractions,…

Software Engineering · Computer Science 2025-12-02 Yanlin Wang , Xinyi Xu , Jiachi Chen , Tingting Bi , Wenchao Gu , Zibin Zheng

In the first half of 2025, coding agents have emerged as a category of development tools that have very quickly transitioned to the practice. Unlike ''traditional'' code completion LLMs such as Copilot, agents like Cursor, Claude Code, or…

Software Engineering · Computer Science 2026-04-09 Romain Robbes , Théo Matricon , Thomas Degueule , Andre Hora , Stefano Zacchiroli

Large language models (LLMs) and their agentic frameworks are increasingly adopted to perform development tasks such as automated program repair (APR). While prior work has identified security risks in LLM-generated code, most have focused…

Cryptography and Security · Computer Science 2025-12-30 Amirali Sajadi , Kostadin Damevski , Preetha Chatterjee

Current benchmarks for evaluating software engineering agents, such as SWE-Bench Verified, are predominantly derived from GitHub issues and fail to accurately reflect how developers interact with chat-based coding assistants in integrated…

Software Engineering · Computer Science 2026-01-27 Spandan Garg , Benjamin Steenhoek , Yufan Huang

Software Engineering Agents (SWE agents) can autonomously perform development tasks on benchmarks like SWE Bench, but still face challenges when tackling complex and ambiguous real-world tasks. Consequently, SWE agents are often designed to…

Software Engineering · Computer Science 2025-10-13 Aayush Kumar , Yasharth Bajpai , Sumit Gulwani , Gustavo Soares , Emerson Murphy-Hill

Beyond scratch coding, exploiting large-scale code repositories (e.g., GitHub) for practical tasks is vital in real-world software development, yet current benchmarks rarely evaluate code agents in such authentic, workflow-driven scenarios.…

Large language model (LLM) based coding agents increasingly act as autonomous contributors that generate and merge pull requests, yet their real-world effects on software projects are unclear-especially compared with widely adopted…

Software Engineering · Computer Science 2026-01-28 Shyam Agarwal , Hao He , Bogdan Vasilescu
‹ Prev 1 2 3 10 Next ›