Related papers: Multi-Agent Dialectical Refinement for Enhanced Ar…

Revisiting Multi-Agent Debate as Test-Time Scaling: A Systematic Study of Conditional Effectiveness

The remarkable growth in large language model (LLM) capabilities has spurred exploration into multi-agent systems, with debate frameworks emerging as a promising avenue for enhanced problem-solving. These multi-agent debate (MAD)…

Artificial Intelligence · Computer Science 2025-06-23 Yongjin Yang , Euiin Yi , Jongwoo Ko , Kimin Lee , Zhijing Jin , Se-Young Yun

Free-MAD: Consensus-Free Multi-Agent Debate

Multi-agent debate (MAD) is an emerging approach to improving the reasoning capabilities of large language models (LLMs). Existing MAD methods rely on multiple rounds of interaction among agents to reach consensus, and the final output is…

Artificial Intelligence · Computer Science 2025-09-16 Yu Cui , Hang Fu , Haibin Zhang , Licheng Wang , Cong Zuo

Multi-Agent Debate Strategies to Enhance Requirements Engineering with Large Language Models

Context: Large Language Model (LLM) agents are becoming widely used for various Requirements Engineering (RE) tasks. Research on improving their accuracy mainly focuses on prompt engineering, model fine-tuning, and retrieval augmented…

Software Engineering · Computer Science 2025-11-20 Marc Oriol , Quim Motger , Jordi Marco , Xavier Franch

Should we be going MAD? A Look at Multi-Agent Debate Strategies for LLMs

Recent advancements in large language models (LLMs) underscore their potential for responding to inquiries in various domains. However, ensuring that generative agents provide accurate and reliable answers remains an ongoing challenge. In…

Computation and Language · Computer Science 2024-07-19 Andries Smit , Paul Duckworth , Nathan Grinsztajn , Thomas D. Barrett , Arnu Pretorius

Self-Improvement of Language Models by Post-Training on Multi-Agent Debate

Self-improvement, where models improve beyond their current performance without external supervision, remains a challenge. The core difficulty is sourcing a training signal stronger than what the model itself can currently produce. Majority…

Artificial Intelligence · Computer Science 2026-02-02 Ankur Samanta , Akshayaa Magesh , Runzhe Wu , Ayush Jain , Youliang Yu , Daniel Jiang , Boris Vidolov , Paul Sajda , Yonathan Efroni , Kaveh Hassani

Courtroom-Style Multi-Agent Debate with Progressive RAG and Role-Switching for Controversial Claim Verification

Large language models (LLMs) remain unreliable for high-stakes claim verification due to hallucinations and shallow reasoning. While retrieval-augmented generation (RAG) and multi-agent debate (MAD) address this, they are limited by…

Computation and Language · Computer Science 2026-05-13 Masnun Nuha Chowdhury , Nusrat Jahan Beg , Umme Hunny Khan , Syed Rifat Raiyan , Md Kamrul Hasan , Hasan Mahmud

Multi-Agent Debate with Memory Masking

Large language models (LLMs) have recently demonstrated impressive capabilities in reasoning tasks. Currently, mainstream LLM reasoning frameworks predominantly focus on scaling up inference-time sampling to enhance performance. In…

Computation and Language · Computer Science 2026-03-24 Hongduan Tian , Xiao Feng , Ziyuan Zhao , Xiangyu Zhu , Rolan Yan , Bo Han

Is Multi-Agent Debate (MAD) the Silver Bullet? An Empirical Analysis of MAD in Code Summarization and Translation

Large Language Models (LLMs) have advanced autonomous agents' planning and decision-making, yet they struggle with complex tasks requiring diverse expertise and multi-step reasoning. Multi-Agent Debate (MAD) systems, introduced in NLP…

Software Engineering · Computer Science 2025-03-18 Jina Chun , Qihong Chen , Jiawei Li , Iftekhar Ahmed

Demystifying Multi-Agent Debate: The Role of Confidence and Diversity

Multi-agent debate (MAD) is widely used to improve large language model (LLM) performance through test-time scaling, yet recent work shows that vanilla MAD often underperforms simple majority vote despite higher computational cost. Studies…

Computation and Language · Computer Science 2026-01-29 Xiaochen Zhu , Caiqi Zhang , Yizhou Chi , Tom Stafford , Nigel Collier , Andreas Vlachos

iMAD: Intelligent Multi-Agent Debate for Efficient and Accurate LLM Inference

Large Language Model (LLM) agent systems have advanced rapidly, driven by their strong generalization in zero-shot settings. To further enhance reasoning and accuracy on complex tasks, Multi-Agent Debate (MAD) has emerged as a promising…

Computation and Language · Computer Science 2025-12-03 Wei Fan , JinYi Yoon , Bo Ji

Multi-Agent Debate: A Unified Agentic Framework for Tabular Anomaly Detection

Tabular anomaly detection is often handled by single detectors or static ensembles, even though strong performance on tabular data typically comes from heterogeneous model families (e.g., tree ensembles, deep tabular networks, and tabular…

Machine Learning · Computer Science 2026-02-17 Pinqiao Wang , Sheng Li

Can LLMs Produce Faithful Explanations For Fact-checking? Towards Faithful Explainable Fact-Checking via Multi-Agent Debate

Fact-checking research has extensively explored verification but less so the generation of natural-language explanations, crucial for user trust. While Large Language Models (LLMs) excel in text generation, their capability for producing…

Computation and Language · Computer Science 2024-02-13 Kyungha Kim , Sangyun Lee , Kung-Hsiang Huang , Hou Pong Chan , Manling Li , Heng Ji

MALLM: Multi-Agent Large Language Models Framework

Multi-agent debate (MAD) has demonstrated the ability to augment collective intelligence by scaling test-time compute and leveraging expertise. Current frameworks for multi-agent debate are often designed towards tool use, lack integrated…

Multiagent Systems · Computer Science 2025-12-16 Jonas Becker , Lars Benedikt Kaesberg , Niklas Bauer , Jan Philip Wahle , Terry Ruas , Bela Gipp

When Identity Skews Debate: Anonymization for Bias-Reduced Multi-Agent Reasoning

Multi-agent debate (MAD) aims to improve large language model (LLM) reasoning by letting multiple agents exchange answers and then aggregate their opinions. Yet recent studies reveal that agents are not neutral: they are prone to…

Artificial Intelligence · Computer Science 2026-04-13 Hyeong Kyu Choi , Xiaojin Zhu , Sharon Li

From Belief Entrenchment to Robust Reasoning in LLM Agents

Multi-Agent Debate (MAD) has emerged as a promising inference scaling method for Large Language Model (LLM) reasoning. However, it frequently suffers from belief entrenchment, where agents reinforce shared errors rather than correcting…

Machine Learning · Computer Science 2026-02-12 Jihwan Oh , Minchan Jeong , Jongwoo Ko , Se-Young Yun

OPTAGENT: Optimizing Multi-Agent LLM Interactions Through Verbal Reinforcement Learning for Enhanced Reasoning

Large Language Models (LLMs) have shown remarkable reasoning capabilities in mathematical and scientific tasks. To enhance complex reasoning, multi-agent systems have been proposed to harness the collective intelligence of LLM agents.…

Artificial Intelligence · Computer Science 2025-10-22 Zhenyu Bi , Meng Lu , Yang Li , Swastik Roy , Weijie Guan , Morteza Ziyadi , Xuan Wang

Epistemic Gain, Aleatoric Cost: Uncertainty Decomposition in Multi-Agent Debate for Math Reasoning

Multi-Agent Debate (MAD) has shown promise in leveraging collective intelligence to improve reasoning and reduce hallucinations, yet it remains unclear how information exchange shapes the underlying ability. Empirically, MAD exhibits…

Multiagent Systems · Computer Science 2026-03-03 Dan Qiao , Binbin Chen , Fengyu Cai , Jianlong Chen , Wenhao Li , Fuxin Jiang , Zuzhi Chen , Hongyuan Zha , Tieying Zhang , Baoxiang Wang

Stop Overvaluing Multi-Agent Debate -- We Must Rethink Evaluation and Embrace Model Heterogeneity

Multi-agent debate (MAD) has gained significant attention as a promising line of research to improve the factual accuracy and reasoning capabilities of large language models (LLMs). Despite its conceptual appeal, current MAD research…

Computation and Language · Computer Science 2025-06-24 Hangfan Zhang , Zhiyao Cui , Jianhao Chen , Xinrun Wang , Qiaosheng Zhang , Zhen Wang , Dinghao Wu , Shuyue Hu

Tool-MAD: A Multi-Agent Debate Framework for Fact Verification with Diverse Tool Augmentation and Adaptive Retrieval

Large Language Models (LLMs) suffer from hallucinations and factual inaccuracies, especially in complex reasoning and fact verification tasks. Multi-Agent Debate (MAD) systems aim to improve answer accuracy by enabling multiple LLM agents…

Computation and Language · Computer Science 2026-01-09 Seyeon Jeong , Yeonjun Choi , JongWook Kim , Beakcheol Jang

Multi-Agent Causal Discovery Using Large Language Models

Causal discovery aims to identify causal relationships between variables and is a fundamental problem across the sciences. Traditional statistical causal discovery (SCD) methods rely solely on observational data and ignore the contextual…

Artificial Intelligence · Computer Science 2026-05-27 Hao Duong Le , Xin Xia , Haijie Xu , Chen Zhang