Related papers: FullStack-Agent: Enhancing Agentic Full-Stack Web …

Automatically Generating Web Applications from Requirements Via Multi-Agent Test-Driven Development

Developing full-stack web applications is complex and time-intensive, demanding proficiency across diverse technologies and frameworks. Although recent advances in multimodal large language models (MLLMs) enable automated webpage generation…

Software Engineering · Computer Science 2025-10-02 Yuxuan Wan , Tingshuo Liang , Jiakai Xu , Jingyu Xiao , Yintong Huo , Michael R. Lyu

FeatureBench: Benchmarking Agentic Coding for Complex Feature Development

Agents powered by large language models (LLMs) are increasingly adopted in the software industry, contributing code as collaborators or even autonomous developers. As their presence grows, it becomes important to assess the current…

Software Engineering · Computer Science 2026-02-12 Qixing Zhou , Jiacheng Zhang , Haiyang Wang , Rui Hao , Jiahe Wang , Minghao Han , Yuxue Yang , Shuzhe Wu , Feiyang Pan , Lue Fan , Dandan Tu , Zhaoxiang Zhang

ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development

The evolution of Large Language Models (LLMs) into autonomous agents has expanded the scope of AI coding from localized code generation to complex, repository-level, and execution-driven problem solving. However, current benchmarks…

Software Engineering · Computer Science 2026-01-19 Jie Yang , Honglin Guo , Li Ji , Jiazheng Zhou , Rui Zheng , Zhikai Lei , Shuo Zhang , Zhiheng Xi , Shichun Liu , Yuxin Wang , Bo Wang , Yining Zheng , Tao Gui , Xipeng Qiu

Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion?

Code completion, a key downstream task in code generation, is one of the most frequent and impactful methods for enhancing developer productivity in software development. As intelligent completion tools evolve, we need a robust evaluation…

Software Engineering · Computer Science 2024-10-25 Zhenyu Pan , Rongyu Cao , Yongchang Cao , Yingwei Ma , Binhua Li , Fei Huang , Han Liu , Yongbin Li

SolAgent: A Specialized Multi-Agent Framework for Solidity Code Generation

Smart contracts are the backbone of the decentralized web, yet ensuring their functional correctness and security remains a critical challenge. While Large Language Models (LLMs) have shown promise in code generation, they often struggle…

Software Engineering · Computer Science 2026-02-02 Wei Chen , Zhiyuan Peng , Xin Yin , Chao Ni , Chenhao Ying , Bang Xie , Yuan Luo

TransAgent: Enhancing LLM-Based Code Translation via Fine-Grained Execution Alignment

Code translation transforms code between programming languages while preserving functionality, which is critical in software development and maintenance. While traditional learning-based code translation methods have limited effectiveness…

Software Engineering · Computer Science 2026-04-08 Zhiqiang Yuan , Weitong Chen , Hanlin Wang , Xin Peng , Zhenpeng Chen , Yiling Lou

LLM Agents Making Agent Tools

Tool use has turned large language models (LLMs) into powerful agents that can perform complex multi-step tasks by dynamically utilising external software components. However, these tools must be implemented in advance by human developers,…

Computation and Language · Computer Science 2025-06-02 Georg Wölflein , Dyke Ferber , Daniel Truhn , Ognjen Arandjelović , Jakob Nikolas Kather

MASTEST: A LLM-Based Multi-Agent System For RESTful API Tests

Testing RESTful API is increasingly important in quality assurance of cloud-native applications. Recent advances in machine learning (ML) techniques have demonstrated that various testing activities can be performed automatically by large…

Software Engineering · Computer Science 2025-11-25 Xiaoke Han , Hong Zhu

daVinci-Dev: Agent-native Mid-training for Software Engineering

Recently, the frontier of Large Language Model (LLM) capabilities has shifted from single-turn code generation to agentic software engineering-a paradigm where models autonomously navigate, edit, and test complex repositories. While…

Software Engineering · Computer Science 2026-01-28 Ji Zeng , Dayuan Fu , Tiantian Mi , Yumin Zhuang , Yaxing Huang , Xuefeng Li , Lyumanshan Ye , Muhang Xie , Qishuo Hua , Zhen Huang , Mohan Jiang , Hanning Wang , Jifan Lin , Yang Xiao , Jie Sun , Yunze Wu , Pengfei Liu

DocAgent: A Multi-Agent System for Automated Code Documentation Generation

High-quality code documentation is crucial for software development especially in the era of AI. However, generating it automatically using Large Language Models (LLMs) remains challenging, as existing approaches often produce incomplete,…

Software Engineering · Computer Science 2025-05-27 Dayu Yang , Antoine Simoulin , Xin Qian , Xiaoyi Liu , Yuwei Cao , Zhaopu Teng , Grey Yang

WebDancer: Towards Autonomous Information Seeking Agency

Addressing intricate real-world problems necessitates in-depth information seeking and multi-step reasoning. Recent progress in agentic systems, exemplified by Deep Research, underscores the potential for autonomous multi-step research. In…

Computation and Language · Computer Science 2025-08-12 Jialong Wu , Baixuan Li , Runnan Fang , Wenbiao Yin , Liwen Zhang , Zhengwei Tao , Dingchu Zhang , Zekun Xi , Gang Fu , Yong Jiang , Pengjun Xie , Fei Huang , Jingren Zhou

CodeAgent: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges

Large Language Models (LLMs) have shown promise in automated code generation but typically excel only in simpler tasks such as generating standalone code units. Real-world software development, however, often involves complex code…

Software Engineering · Computer Science 2024-08-12 Kechi Zhang , Jia Li , Ge Li , Xianjie Shi , Zhi Jin

AutoAgent: A Fully-Automated and Zero-Code Framework for LLM Agents

Large Language Model (LLM) Agents have demonstrated remarkable capabilities in task automation and intelligent decision-making, driving the widespread adoption of agent development frameworks such as LangChain and AutoGen. However, these…

Artificial Intelligence · Computer Science 2025-10-10 Jiabin Tang , Tianyu Fan , Chao Huang

LocAgent: Graph-Guided LLM Agents for Code Localization

Code localization--identifying precisely where in a codebase changes need to be made--is a fundamental yet challenging task in software maintenance. Existing approaches struggle to efficiently navigate complex codebases when identifying…

Software Engineering · Computer Science 2025-04-30 Zhaoling Chen , Xiangru Tang , Gangda Deng , Fang Wu , Jialong Wu , Zhiwei Jiang , Viktor Prasanna , Arman Cohan , Xingyao Wang

Autonomous Legacy Web Application Upgrades Using a Multi-Agent System

The use of Large Language Models (LLMs) for autonomous code generation is gaining attention in emerging technologies. As LLM capabilities expand, they offer new possibilities such as code refactoring, security enhancements, and legacy…

Software Engineering · Computer Science 2025-02-03 Valtteri Ala-Salmi , Zeeshan Rasheed , Abdul Malik Sami , Zheying Zhang , Kai-Kristian Kemell , Jussi Rasku , Shahbaz Siddeeq , Mika Saari , Pekka Abrahamsson

CodeAgent: Autonomous Communicative Agents for Code Review

Code review, which aims at ensuring the overall quality and reliability of software, is a cornerstone of software development. Unfortunately, while crucial, Code review is a labor-intensive process that the research community is looking to…

Software Engineering · Computer Science 2024-09-26 Xunzhu Tang , Kisub Kim , Yewei Song , Cedric Lothritz , Bei Li , Saad Ezzini , Haoye Tian , Jacques Klein , Tegawende F. Bissyande

WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning

Agent systems powered by large language models (LLMs) have demonstrated impressive performance on repository-level code-generation tasks. However, for tasks such as website codebase generation, which depend heavily on visual effects and…

Computation and Language · Computer Science 2025-09-29 Zimu Lu , Houxing Ren , Yunqiao Yang , Ke Wang , Zhuofan Zong , Junting Pan , Mingjie Zhan , Hongsheng Li

BabelCoder: Agentic Code Translation with Specification Alignment

As software systems evolve, developers increasingly work across multiple programming languages and often face the need to migrate code from one language to another. While automatic code translation offers a promising solution, it has long…

Software Engineering · Computer Science 2025-12-09 Fazle Rabbi , Soumit Kanti Saha , Tri Minh Triet Pham , Song Wang , Jinqiu Yang

ProjDevBench: Benchmarking AI Coding Agents on End-to-End Project Development

Recent coding agents can generate complete codebases from simple prompts, yet existing evaluations focus on issue-level bug fixing and lag behind end-to-end development. We introduce ProjDevBench, an end-to-end benchmark that provides…

Artificial Intelligence · Computer Science 2026-02-10 Pengrui Lu , Shiqi Zhang , Yunzhong Hou , Lyumanshan Ye , Chaoyi Huang , Zixi Chen , Ji Zeng , Hantao Jiang , Pengfei Liu , Yiwei Wang , Ming-Hsuan Yang

FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow

Front-end engineering involves a complex workflow where engineers conceptualize designs, translate them into code, and iteratively refine the implementation. While recent benchmarks primarily focus on converting visual designs to code, we…

Computation and Language · Computer Science 2025-05-27 Haoyu Sun , Huichen Will Wang , Jiawei Gu , Linjie Li , Yu Cheng