Homecs.SEarXiv:2605.29910

Agora: Toward Autonomous Bug Detection in Production-Level Consensus Protocols with LLM Agents

cs.SEArtificial Intelligence2026-05v1license

Abstract

Consensus protocols form the backbone of distributed systems and blockchains, where implementation bugs can cause data corruption and financial losses. While LLM-based approaches show promise in code analysis, they struggle with deep protocol-level logic bugs involving complex state-dependent behaviors across multiple execution stages. We present Agora, a domain-aware multi-agent framework that integrates hypothesis-driven testing with LLM capabilities for systematic protocol verification. Agora employs specialized agents that collaboratively explore protocol state spaces, synthesize attack scenarios using domain-specific constraints, and validate findings through iterative refinement. This explicit role separation enables reasoning about global protocol invariants beyond single-function code analysis. We evaluate Agora on four consensus implementations (Raft, EPaxos, HotStuff, BullShark) using four state-of-the-art LLMs. Agora discovers 15 previously unknown protocol-level logic bugs that violate safety properties, while existing LLM-based agents fail to detect any such protocol-level logic bugs. Our results demonstrate that domain-aware multi-agent collaboration is essential for detecting deep logic bugs in complex protocols.

Comments: 35 pages, 4 figures

Cite

@article{arxiv.2605.29910,
  title  = {Agora: Toward Autonomous Bug Detection in Production-Level Consensus Protocols with LLM Agents},
  author = {Xiang Liu and Sa Song and Zhaowei Zhang and Huiying Lan and Jason Zeng and Ming Wu and Michael Heinrich and Yong Sun and Ceyao Zhang},
  journal= {arXiv preprint arXiv:2605.29910},
  year   = {2026}
}