Related papers: Why Do Multi-Agent LLM Systems Fail?

MAS-FIRE: Fault Injection and Reliability Evaluation for LLM-Based Multi-Agent Systems

As LLM-based Multi-Agent Systems (MAS) are increasingly deployed for complex tasks, ensuring their reliability has become a pressing challenge. Since MAS coordinate through unstructured natural language rather than rigid protocols, they are…

Software Engineering · Computer Science 2026-02-24 Jin Jia , Zhiling Deng , Zhuangbin Chen , Yingqi Wang , Zibin Zheng

Who is Introducing the Failure? Automatically Attributing Failures of Multi-Agent Systems via Spectrum Analysis

Large Language Model Powered Multi-Agent Systems (MASs) are increasingly employed to automate complex real-world problems, such as programming and scientific discovery. Despite their promising, MASs are not without their flaws. However,…

Software Engineering · Computer Science 2025-09-18 Yu Ge , Linna Xie , Zhong Li , Yu Pei , Tian Zhang

Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems

Failure attribution in LLM multi-agent systems-identifying the agent and step responsible for task failures-provides crucial clues for systems debugging but remains underexplored and labor-intensive. In this paper, we propose and formulate…

Multiagent Systems · Computer Science 2025-06-03 Shaokun Zhang , Ming Yin , Jieyu Zhang , Jiale Liu , Zhiguang Han , Jingyang Zhang , Beibin Li , Chi Wang , Huazheng Wang , Yiran Chen , Qingyun Wu

AgentAsk: Multi-Agent Systems Need to Ask

Multi-agent systems (MAS) built on large language models promise improved problem-solving through collaboration, yet they often fail to consistently outperform strong single-agent baselines due to error propagation at inter-agent message…

Artificial Intelligence · Computer Science 2026-01-21 Bohan Lin , Kuo Yang , Zelin Tan , Yingchuan Lai , Chen Zhang , Guibin Zhang , Xinlei Yu , Miao Yu , Xu Wang , Yudong Zhang , Yang Wang

Exploring Autonomous Agents: A Closer Look at Why They Fail When Completing Tasks

Autonomous agent systems powered by Large Language Models (LLMs) have demonstrated promising capabilities in automating complex tasks. However, current evaluations largely rely on success rates without systematically analyzing the…

Artificial Intelligence · Computer Science 2025-08-19 Ruofan Lu , Yichen Li , Yintong Huo

Automatic Failure Attribution and Critical Step Prediction Method for Multi-Agent Systems Based on Causal Inference

Multi-agent systems (MAS) are critical for automating complex tasks, yet their practical deployment is severely hampered by the challenge of failure attribution. Current diagnostic tools, which rely on statistical correlations, are…

Artificial Intelligence · Computer Science 2025-09-11 Guoqing Ma , Jia Zhu , Hanghui Guo , Weijie Shi , Jiawei Shen , Jingjiang Liu , Yidan Liang

A Large-Scale Study on the Development and Issues of Multi-Agent AI Systems

The rapid emergence of multi-agent AI systems (MAS), including LangChain, CrewAI, and AutoGen, has shaped how large language model (LLM) applications are developed and orchestrated. However, little is known about how these systems evolve…

Software Engineering · Computer Science 2026-01-13 Daniel Liu , Krishna Upadhyay , Vinaik Chhetri , A. B. Siddique , Umar Farooq

Risk Analysis Techniques for Governed LLM-based Multi-Agent Systems

Organisations are starting to adopt LLM-based AI agents, with their deployments naturally evolving from single agents towards interconnected, multi-agent networks. Yet a collection of safe agents does not guarantee a safe collection of…

Multiagent Systems · Computer Science 2025-08-11 Alistair Reid , Simon O'Callaghan , Liam Carroll , Tiberio Caetano

When Does Multi-Agent Collaboration Help? An Entropy Perspective

Multi-agent systems (MAS) have emerged as a prominent paradigm for leveraging large language models (LLMs) to tackle complex tasks. However, the mechanisms governing the effectiveness of MAS built upon publicly available LLMs, specifically…

Multiagent Systems · Computer Science 2026-05-11 Yuxuan Zhao , Sijia Chen , Ningxin Su

MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems

LLM-based multi-agent systems (MAS) have shown significant potential in tackling diverse tasks. However, to design effective MAS, existing approaches heavily rely on manual configurations or multiple calls of advanced LLMs, resulting in…

Computation and Language · Computer Science 2025-03-06 Rui Ye , Shuo Tang , Rui Ge , Yaxin Du , Zhenfei Yin , Siheng Chen , Jing Shao

Towards Self-Improving Error Diagnosis in Multi-Agent Systems

Large Language Model (LLM)-based Multi-Agent Systems (MAS) enable complex problem-solving but introduce significant debugging challenges, characterized by long interaction traces, inter-agent dependencies, and delayed error manifestation.…

Multiagent Systems · Computer Science 2026-04-21 Jiazheng Li , Emine Yilmaz , Bei Chen , Dieu-Thu Le

Understanding and Bridging the Planner-Coder Gap: A Systematic Study on the Robustness of Multi-Agent Systems for Code Generation

Multi-agent systems (MASs) have emerged as a promising paradigm for automated code generation, demonstrating impressive performance on established benchmarks. Despite their prosperous development, the fundamental mechanisms underlying their…

Software Engineering · Computer Science 2026-02-02 Zongyi Lyu , Songqiang Chen , Zhenlan Ji , Liwen Wang , Shuai Wang , Daoyuan Wu , Wenxuan Wang , Shing-Chi Cheung

Single-agent or Multi-agent Systems? Why Not Both?

Multi-agent systems (MAS) decompose complex tasks and delegate subtasks to different large language model (LLM) agents and tools. Prior studies have reported the superior accuracy performance of MAS across diverse domains, enabled by…

Multiagent Systems · Computer Science 2025-05-27 Mingyan Gao , Yanzi Li , Banruo Liu , Yifan Yu , Phillip Wang , Ching-Yu Lin , Fan Lai

MASLab: A Unified and Comprehensive Codebase for LLM-based Multi-Agent Systems

LLM-based multi-agent systems (MAS) have demonstrated significant potential in enhancing single LLMs to address complex and diverse tasks in practical applications. Despite considerable advancements, the field lacks a unified codebase that…

Computation and Language · Computer Science 2025-05-23 Rui Ye , Keduan Huang , Qimin Wu , Yuzhu Cai , Tian Jin , Xianghe Pang , Xiangrui Liu , Jiaqi Su , Chen Qian , Bohan Tang , Kaiqu Liang , Jiaao Chen , Yue Hu , Zhenfei Yin , Rongye Shi , Bo An , Yang Gao , Wenjun Wu , Lei Bai , Siheng Chen

Latent Collaboration in Multi-Agent Systems

Multi-agent systems (MAS) extend large language models (LLMs) from independent single-model reasoning to coordinative system-level intelligence. While existing LLM agents depend on text-based mediation for reasoning and communication, we…

Computation and Language · Computer Science 2025-12-09 Jiaru Zou , Xiyuan Yang , Ruizhong Qiu , Gaotang Li , Katherine Tieu , Pan Lu , Ke Shen , Hanghang Tong , Yejin Choi , Jingrui He , James Zou , Mengdi Wang , Ling Yang

Beyond Resolution Rates: Behavioral Drivers of Coding Agent Success and Failure

Coding agents represent a new paradigm in automated software engineering, combining the reasoning capabilities of Large Language Models (LLMs) with tool-augmented interaction loops. However, coding agents still have severe limitations.…

Software Engineering · Computer Science 2026-04-06 Tural Mehtiyev , Wesley Assunção

Understanding Agent Scaling in LLM-Based Multi-Agent Systems via Diversity

LLM-based multi-agent systems (MAS) have emerged as a promising approach to tackle complex tasks that are difficult for individual LLMs. A natural strategy is to scale performance by increasing the number of agents; however, we find that…

Artificial Intelligence · Computer Science 2026-02-04 Yingxuan Yang , Chengrui Qu , Muning Wen , Laixi Shi , Ying Wen , Weinan Zhang , Adam Wierman , Shangding Gu

ErrorMap and ErrorAtlas: Charting the Failure Landscape of Large Language Models

Large Language Models (LLM) benchmarks tell us when models fail, but not why they fail. A wrong answer on a reasoning dataset may stem from formatting issues, calculation errors, or dataset noise rather than weak reasoning. Without…

Artificial Intelligence · Computer Science 2026-02-18 Shir Ashury-Tahan , Yifan Mai , Elron Bandel , Michal Shmueli-Scheuer , Leshem Choshen

MAS-ProVe: Understanding the Process Verification of Multi-Agent Systems

Multi-Agent Systems (MAS) built on Large Language Models (LLMs) often exhibit high variance in their reasoning trajectories. Process verification, which evaluates intermediate steps in trajectories, has shown promise in general reasoning…

Artificial Intelligence · Computer Science 2026-02-04 Vishal Venkataramani , Haizhou Shi , Zixuan Ke , Austin Xu , Xiaoxiao He , Yingbo Zhou , Semih Yavuz , Hao Wang , Shafiq Joty

FLARE: Agentic Coverage-Guided Fuzzing for LLM-Based Multi-Agent Systems

Multi-Agent LLM Systems (MAS) have been adopted to automate complex human workflows by breaking down tasks into subtasks. However, due to the non-deterministic behavior of LLM agents and the intricate interactions between agents, MAS…

Software Engineering · Computer Science 2026-04-08 Mingxuan Hui , Xinyue Li , Lu Wang , Chengcheng Wan , Yifan Wang , Yimian Wang , Feiyue Song , Beining Shi , Yixi Li , Yaxiao Li