English
Related papers

Related papers: The Specification Gap: Coordination Failure Under …

200 papers

Large Language Model (LLM) agents demonstrate strong performance in autonomous code generation under loose specifications. However, production-grade software requires strict adherence to structural constraints, such as architectural…

Software Engineering · Computer Science 2026-05-08 Francesco Dente , Dario Satriani , Paolo Papotti

Current coding-agent benchmarks usually pro- vide the full task specification upfront. Real research coding often does not: the intended system is progressively disclosed through in- teraction, requiring the agent to track durable design…

Software Engineering · Computer Science 2026-03-19 Lu Yan , Xuan Chen , Xiangyu Zhang

Large language model (LLM) multi-agent coding systems typically fix agent capabilities at design time. We study an alternative setting, earned autonomy, in which a coding agent starts with zero pre-defined functions and incrementally builds…

Human-Computer Interaction · Computer Science 2026-03-31 Yinghao Wang , Cheng Wang

Multi-agent LLM systems fail in production at rates between 41% and 87%, mostly due to coordination defects rather than base-model capability. Existing responses split between cataloguing failure modes empirically and shipping declarative…

Multiagent Systems · Computer Science 2026-05-06 Maksym Nechepurenko , Pavel Shuvalov

Coding agents represent a new paradigm in automated software engineering, combining the reasoning capabilities of Large Language Models (LLMs) with tool-augmented interaction loops. However, coding agents still have severe limitations.…

Software Engineering · Computer Science 2026-04-06 Tural Mehtiyev , Wesley Assunção

Specification synthesis, the task of automatically inferring formal specifications from program implementations and natural language, is important for refactoring, transpilation, optimization, and verification, yet remains an open challenge…

Programming Languages · Computer Science 2026-05-28 Tarun Suresh , David Korczynski , Julien Vanegue

Code generation refers to the automatic generation of source code based on a given programming specification, which has garnered significant attention particularly with the advancement of large language models (LLMs). However, due to the…

Software Engineering · Computer Science 2025-09-09 Zhao Tian , Junjie Chen

Hypernetwork-based methods such as Doc-to-LoRA internalize a document into an LLM's weights in a single forward pass, but they fail systematically on conflicts: when the document contradicts pretraining knowledge, accuracy collapses to…

Machine Learning · Computer Science 2026-05-12 Shuaizhi Cheng , Xiang Shi , Zhiwei Zhang , Mingwei Li

The rapid evolution of software libraries creates a significant challenge for Large Language Models (LLMs), whose static parametric knowledge often becomes stale post-training. While retrieval-augmented generation (RAG) is commonly used to…

Software Engineering · Computer Science 2026-04-13 Ahmed Nusayer Ashik , Shaowei Wang , Tse-Hsun Chen , Muhammad Asaduzzaman , Yuan Tian

Recent advances in Large Language Models (LLMs) have upgraded them from sophisticated text generators to autonomous agents capable of cooperation and tool use in multi-agent systems (MAS). However, it remains unclear how disagreements shape…

Computation and Language · Computer Science 2025-10-03 Tianjie Ju , Bowen Wang , Hao Fei , Mong-Li Lee , Wynne Hsu , Yun Li , Qianren Wang , Pengzhou Cheng , Zongru Wu , Haodong Zhao , Zhuosheng Zhang , Gongshen Liu

Large language models are increasingly deployed in multi-agent systems to overcome context limitations by distributing information across agents. Yet whether agents can reliably compute with distributed information, rather than merely…

Multiagent Systems · Computer Science 2026-04-15 Yuzhe Zhang , Feiran Liu , Yi Shan , Xinyi Huang , Xin Yang , Yueqi Zhu , Xuxin Cheng , Cao Liu , Ke Zeng , Terry Jingchen Zhang , Wenyuan Jiang

Repository-level issue resolution benchmarks have become a standard testbed for evaluating LLM-based agents, yet success is still predominantly measured by test pass rates. In practice, however, acceptable patches must also comply with…

Software Engineering · Computer Science 2026-04-08 Kai Yu , Zhenhao Zhou , Junhao Zeng , Ying Wang , Xueying Du , Zhiqiang Yuan , Junwei Liu , Ziyu Zhou , Yujia Wang , Chong Wang , Xin Peng

When LLM agents work together, they seem to be more powerful than a single LLM in mathematical question answering. However, are they also more robust to adversarial inputs? We investigate this question using adversarially perturbed math…

Computation and Language · Computer Science 2026-03-17 Khashayar Alavi , Zhastay Yeltay , Lucie Flek , Akbar Karimi

Learning analytics researchers often analyze qualitative student data such as coded annotations or interview transcripts to understand learning processes. With the rise of generative AI, fully automated and human-AI workflows have emerged…

Computation and Language · Computer Science 2026-01-21 Elham Tajik , Conrad Borchers , Bahar Shahrokhian , Sebastian Simon , Ali Keramati , Sonika Pal , Sreecharan Sankaranarayanan

Since 2022, AI-powered coding assistants have produced contradictory evidence: controlled studies report 20-56% productivity gains on well-scoped tasks, while the most rigorous RCT documents a 19% slowdown for experienced developers, and…

Software Engineering · Computer Science 2026-05-05 Sabry E. Farrag

Large language model (LLM) agents increasingly coordinate in multi-agent systems, yet we lack an understanding of where and why cooperation failures may arise. In many real-world coordination problems, from knowledge sharing in…

Multiagent Systems · Computer Science 2026-04-10 Advait Yadav , Sid Black , Oliver Sourbut

Automated short answer scoring (ASAS) is shifting from discriminative, fine-tuned models to large language models (LLMs) used in few-shot settings. This paradigm leverages LLMs broad world knowledge and ease of deployment, but limited…

Computation and Language · Computer Science 2026-05-26 Abigail Victoria Gurin Schleifer , Moriah Ariely , Beata Beigman Klebanov , Asaf Salman , Giora Alexandron

Large Language Model (LLM)-powered multi-agent systems (MAS) have rapidly advanced collaborative reasoning, tool use, and role-specialized coordination in complex tasks. However, reliability-critical deployment remains hindered by a…

Multi-agent LLM systems fail to realize parallel speedups due to costly coordination. We present CodeCRDT, an observation-driven coordination pattern where agents coordinate by monitoring a shared state with observable updates and…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-23 Sergey Pugachev

Despite recent progress in generating hardware RTL code with LLMs, existing solutions still suffer from a substantial gap between practical application scenarios and the requirements of real-world RTL code development. Prior approaches…

Hardware Architecture · Computer Science 2025-09-10 Zhongzhi Yu , Mingjie Liu , Michael Zimmer , Yingyan Celine Lin , Yong Liu , Haoxing Ren
‹ Prev 1 2 3 10 Next ›