Related papers: The Specification Gap: Coordination Failure Under …

Constraint Decay: The Fragility of LLM Agents in Backend Code Generation

Large Language Model (LLM) agents demonstrate strong performance in autonomous code generation under loose specifications. However, production-grade software requires strict adherence to structural constraints, such as architectural…

Software Engineering · Computer Science 2026-05-08 Francesco Dente , Dario Satriani , Paolo Papotti

When the Specification Emerges: Benchmarking Faithfulness Loss in Long-Horizon Coding Agents

Current coding-agent benchmarks usually pro- vide the full task specification upfront. Real research coding often does not: the intended system is progressively disclosed through in- teraction, requiring the agent to track durable design…

Software Engineering · Computer Science 2026-03-19 Lu Yan , Xuan Chen , Xiangyu Zhang

The Observability Gap: Why Output-Level Human Feedback Fails for LLM Coding Agents

Large language model (LLM) multi-agent coding systems typically fix agent capabilities at design time. We study an alternative setting, earned autonomy, in which a coding agent starts with zero pre-defined functions and incrementally builds…

Human-Computer Interaction · Computer Science 2026-03-31 Yinghao Wang , Cheng Wang

Coordination as an Architectural Layer for LLM-Based Multi-Agent Systems

Multi-agent LLM systems fail in production at rates between 41% and 87%, mostly due to coordination defects rather than base-model capability. Existing responses split between cataloguing failure modes empirically and shipping declarative…

Multiagent Systems · Computer Science 2026-05-06 Maksym Nechepurenko , Pavel Shuvalov

Beyond Resolution Rates: Behavioral Drivers of Coding Agent Success and Failure

Coding agents represent a new paradigm in automated software engineering, combining the reasoning capabilities of Large Language Models (LLMs) with tool-augmented interaction loops. However, coding agents still have severe limitations.…

Software Engineering · Computer Science 2026-04-06 Tural Mehtiyev , Wesley Assunção

Agentic Separation Logic Specification Synthesis

Specification synthesis, the task of automatically inferring formal specifications from program implementations and natural language, is important for refactoring, transpilation, optimization, and verification, yet remains an open challenge…

Programming Languages · Computer Science 2026-05-28 Tarun Suresh , David Korczynski , Julien Vanegue

Aligning Requirement for Large Language Model's Code Generation

Code generation refers to the automatic generation of source code based on a given programming specification, which has garnered significant attention particularly with the advancement of large language models (LLMs). However, due to the…

Software Engineering · Computer Science 2025-09-09 Zhao Tian , Junjie Chen

The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation

Hypernetwork-based methods such as Doc-to-LoRA internalize a document into an LLM's weights in a single forward pass, but they fail systematically on conflicts: when the document contradicts pretraining knowledge, accuracy collapses to…

Machine Learning · Computer Science 2026-05-12 Shuaizhi Cheng , Xiang Shi , Zhiwei Zhang , Mingwei Li

When LLMs Lag Behind: Knowledge Conflicts from Evolving APIs in Code Generation

The rapid evolution of software libraries creates a significant challenge for Large Language Models (LLMs), whose static parametric knowledge often becomes stale post-training. While retrieval-augmented generation (RAG) is commonly used to…

Software Engineering · Computer Science 2026-04-13 Ahmed Nusayer Ashik , Shaowei Wang , Tse-Hsun Chen , Muhammad Asaduzzaman , Yuan Tian

When Disagreements Elicit Robustness: Investigating Self-Repair Capabilities under LLM Multi-Agent Disagreements

Recent advances in Large Language Models (LLMs) have upgraded them from sophisticated text generators to autonomous agents capable of cooperation and tool use in multi-agent systems (MAS). However, it remains unclear how disagreements shape…

Computation and Language · Computer Science 2025-10-03 Tianjie Ju , Bowen Wang , Hao Fei , Mong-Li Lee , Wynne Hsu , Yun Li , Qianren Wang , Pengzhou Cheng , Zongru Wu , Haodong Zhao , Zhuosheng Zhang , Gongshen Liu

Silo-Bench: A Scalable Environment for Evaluating Distributed Coordination in Multi-Agent LLM Systems

Large language models are increasingly deployed in multi-agent systems to overcome context limitations by distributing information across agents. Yet whether agents can reliably compute with distributed information, rather than merely…

Multiagent Systems · Computer Science 2026-04-15 Yuzhe Zhang , Feiran Liu , Yi Shan , Xinyi Huang , Xin Yang , Yueqi Zhu , Xuxin Cheng , Cao Liu , Ke Zeng , Terry Jingchen Zhang , Wenyuan Jiang

Does Pass Rate Tell the Whole Story? Evaluating Design Constraint Compliance in LLM-based Issue Resolution

Repository-level issue resolution benchmarks have become a standard testbed for evaluating LLM-based agents, yet success is still predominantly measured by test pass rates. In practice, however, acceptable patches must also comply with…

Software Engineering · Computer Science 2026-04-08 Kai Yu , Zhenhao Zhou , Junhao Zeng , Ying Wang , Xueying Du , Zhiqiang Yuan , Junwei Liu , Ziyu Zhou , Yujia Wang , Chong Wang , Xin Peng

More Agents Improve Math Problem Solving but Adversarial Robustness Gap Persists

When LLM agents work together, they seem to be more powerful than a single LLM in mathematical question answering. However, are they also more robust to adversarial inputs? We investigate this question using adversarially perturbed math…

Computation and Language · Computer Science 2026-03-17 Khashayar Alavi , Zhastay Yeltay , Lucie Flek , Akbar Karimi

Disagreement as Data: Reasoning Trace Analytics in Multi-Agent Systems

Learning analytics researchers often analyze qualitative student data such as coded annotations or interview transcripts to understand learning processes. With the rise of generative AI, fully automated and human-AI workflows have emerged…

Computation and Language · Computer Science 2026-01-21 Elham Tajik , Conrad Borchers , Bahar Shahrokhian , Sebastian Simon , Ali Keramati , Sonika Pal , Sreecharan Sankaranarayanan

The Productivity-Reliability Paradox: Specification-Driven Governance for AI-Augmented Software Development

Since 2022, AI-powered coding assistants have produced contradictory evidence: controlled studies report 20-56% productivity gains on well-scoped tasks, while the most rigorous RCT documents a 19% slowdown for experienced developers, and…

Software Engineering · Computer Science 2026-05-05 Sabry E. Farrag

More Capable, Less Cooperative? When LLMs Fail At Zero-Cost Collaboration

Large language model (LLM) agents increasingly coordinate in multi-agent systems, yet we lack an understanding of where and why cooperation failures may arise. In many real-world coordination problems, from knowledge sharing in…

Multiagent Systems · Computer Science 2026-04-10 Advait Yadav , Sid Black , Oliver Sourbut

Quality-Conditioned Agreement in Automated Short Answer Scoring: Mid-Range Degradation and the Impact of Task-Specific Adaptation

Automated short answer scoring (ASAS) is shifting from discriminative, fine-tuned models to large language models (LLMs) used in few-shot settings. This paradigm leverages LLMs broad world knowledge and ease of deployment, but limited…

Computation and Language · Computer Science 2026-05-26 Abigail Victoria Gurin Schleifer , Moriah Ariely , Beata Beigman Klebanov , Asaf Salman , Giora Alexandron

Diagnose, Localize, Align: A Full-Stack Framework for Reliable LLM Multi-Agent Systems under Instruction Conflicts

Large Language Model (LLM)-powered multi-agent systems (MAS) have rapidly advanced collaborative reasoning, tool use, and role-specialized coordination in complex tasks. However, reliability-critical deployment remains hindered by a…

Computation and Language · Computer Science 2025-12-16 Guancheng Wan , Leixin Sun , Longxu Dou , Zitong Shi , Fang Wu , Eric Hanchen Jiang , Wenke Huang , Guibin Zhang , Hejia Geng , Xiangru Tang , Zhenfei Yin , Yizhou Sun , Wei Wang

CodeCRDT: Observation-Driven Coordination for Multi-Agent LLM Code Generation

Multi-agent LLM systems fail to realize parallel speedups due to costly coordination. We present CodeCRDT, an observation-driven coordination pattern where agents coordinate by monitoring a shared state with observable updates and…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-23 Sergey Pugachev

Spec2RTL-Agent: Automated Hardware Code Generation from Complex Specifications Using LLM Agent Systems

Despite recent progress in generating hardware RTL code with LLMs, existing solutions still suffer from a substantial gap between practical application scenarios and the requirements of real-world RTL code development. Prior approaches…

Hardware Architecture · Computer Science 2025-09-10 Zhongzhi Yu , Mingjie Liu , Michael Zimmer , Yingyan Celine Lin , Yong Liu , Haoxing Ren