Related papers: $\lambda_A$: A Typed Lambda Calculus for LLM Agent…

Learning to Configure Agentic AI Systems

Configuring LLM-based agent systems involves choosing workflows, tools, token budgets, and prompts from a large combinatorial design space, and is typically handled today by fixed templates or hand-tuned heuristics that apply the same…

Artificial Intelligence · Computer Science 2026-05-22 Aditya Taparia , Som Sagar , Ransalu Senanayake

Collaboration Dynamics and Reliability Challenges of Multi-Agent LLM Systems in Finite Element Analysis

Large Language Model (LLM)-based multi-agent systems are increasingly applied to automate computational workflows in science and engineering. However, how inter-agent dynamics influence reasoning quality and verification reliability remains…

Artificial Intelligence · Computer Science 2025-11-07 Chuan Tian , Yilei Zhang

Agentic Model Checking

Verifying LLM-generated systems code is hard: bugs are prevalent, formal specifications are missing, and safety contracts are encoded implicitly at call sites rather than enforced at function boundaries. We propose agentic model checking, a…

Software Engineering · Computer Science 2026-05-21 Youcheng Sun , Jiawen Liu , Daniel Kroening , Jason Xue

AgentSelect: Benchmark for Narrative Query-to-Agent Recommendation

LLM agents are rapidly becoming the practical interface for task automation, yet the ecosystem lacks a principled way to choose among an exploding space of deployable configurations. Existing LLM leaderboards and tool/agent benchmarks…

Artificial Intelligence · Computer Science 2026-03-05 Yunxiao Shi , Wujiang Xu , Tingwei Chen , Haoning Shang , Ling Yang , Yunfeng Wan , Zhuo Cao , Xing Zi , Dimitris N. Metaxas , Min Xu

The LLMbda Calculus: AI Agents, Conversations, and Information Flow

A conversation with a large language model (LLM) is a sequence of prompts and responses, with each response generated from the preceding conversation. AI agents build such conversations automatically: given an initial human prompt, a…

Programming Languages · Computer Science 2026-02-24 Zac Garby , Andrew D. Gordon , David Sands

An Agentic Framework for Autonomous Materials Computation

Large Language Models (LLMs) have emerged as powerful tools for accelerating scientific discovery, yet their static knowledge and hallucination issues hinder autonomous research applications. Recent advances integrate LLMs into agentic…

Artificial Intelligence · Computer Science 2025-12-23 Zeyu Xia , Jinzhe Ma , Congjie Zheng , Shufei Zhang , Yuqiang Li , Hang Su , P. Hu , Changshui Zhang , Xingao Gong , Wanli Ouyang , Lei Bai , Dongzhan Zhou , Mao Su

Agent-Testing Agent: A Meta-Agent for Automated Testing and Evaluation of Conversational AI Agents

LLM agents are increasingly deployed to plan, retrieve, and write with tools, yet evaluation still leans on static benchmarks and small human studies. We present the Agent-Testing Agent (ATA), a meta-agent that combines static code…

Computation and Language · Computer Science 2025-08-26 Sameer Komoravolu , Khalil Mrini

A Large Language Model-Empowered Agent for Reliable and Robust Structural Analysis

Large language models (LLMs) have exhibited remarkable capabilities across diverse open-domain tasks, yet their application in specialized domains such as civil engineering remains largely unexplored. This paper starts bridging this gap by…

Computation and Language · Computer Science 2025-07-08 Jiachen Liu , Ziheng Geng , Ran Cao , Lu Cheng , Paolo Bocchini , Minghui Cheng

Trustworthy Agent Network: Trust in Agent Networks Must Be Baked In, Not Bolted On

The rapid advancement of Large Language Models has given rise to autonomous LLM-based agents capable of complex reasoning and execution. As these agents transition from isolated operation to collaborative ecosystems, we witness the…

Artificial Intelligence · Computer Science 2026-05-20 Yixiang Yao , Yuhang Yao , Xinyi Fan , Jiechao Gao , Jie Wang , Minjia Zhang , Srivatsan Ravi , Carlee Joe-Wong

Multi-Agent LLMs for Generating Research Limitations

Identifying and articulating limitations is essential for transparent and rigorous scientific research. However, zero-shot large language models (LLMs) approach often produce superficial or general limitation statements (e.g., dataset bias…

Computation and Language · Computer Science 2026-03-17 Ibrahim Al Azher , Zhishuai Guo , Hamed Alhoori

Defining and Detecting the Defects of the Large Language Model-based Autonomous Agents

AI agents are systems capable of perceiving their environment, autonomously planning and executing tasks. Recent advancements in LLM have introduced a transformative paradigm for AI agents, enabling them to interact with external resources…

Software Engineering · Computer Science 2024-12-30 Kaiwen Ning , Jiachi Chen , Jingwen Zhang , Wei Li , Zexu Wang , Yuming Feng , Weizhe Zhang , Zibin Zheng

Efficient Multi-Agent Collaboration with Tool Use for Online Planning in Complex Table Question Answering

Complex table question answering (TQA) aims to answer questions that require complex reasoning, such as multi-step or multi-category reasoning, over data represented in tabular form. Previous approaches demonstrated notable performance by…

Computation and Language · Computer Science 2025-02-11 Wei Zhou , Mohsen Mesgar , Annemarie Friedrich , Heike Adel

From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review

Large language models and autonomous AI agents have evolved rapidly, resulting in a diverse array of evaluation benchmarks, frameworks, and collaboration protocols. Driven by the growing need for standardized evaluation and integration, we…

Artificial Intelligence · Computer Science 2026-03-10 Mohamed Amine Ferrag , Norbert Tihanyi , Merouane Debbah

Lita: Light Agent Uncovers the Agentic Coding Capabilities of LLMs

Large language models (LLMs) are increasingly being applied to programming tasks, ranging from single-turn code completion to autonomous agents. Current code agent designs frequently depend on complex, hand-crafted workflows and tool sets.…

Artificial Intelligence · Computer Science 2025-10-01 Hankun Dai , Maoquan Wang , Mengnan Qi , Yikai Zhang , Zijian Jin , Yongqiang Yao , Yufan Huang , Shengyu Fu , Elsie Nallipogu

A Lightweight Large Language Model-Based Multi-Agent System for 2D Frame Structural Analysis

Large language models (LLMs) have recently been used to empower autonomous agents in engineering, significantly improving automation and efficiency in labor-intensive workflows. However, their potential remains underexplored in structural…

Computation and Language · Computer Science 2025-10-08 Ziheng Geng , Jiachen Liu , Ran Cao , Lu Cheng , Haifeng Wang , Minghui Cheng

PRIMA: Operational Patterns for Resilient Multi-Agent Research with Verifiable Identity and Convergent Feedback

Operating LLMs as coordinated multi-agent research systems over multi-hour runs surfaces failure modes that single-shot evaluation cannot: upstream providers throttle without warning, sub-agents drift the task to fit accessible tools,…

Artificial Intelligence · Computer Science 2026-05-26 Sasank Annapureddy

HADA: Human-AI Agent Decision Alignment Architecture

We present HADA (Human-AI Agent Decision Alignment), a protocol- and framework agnostic reference architecture that keeps both large language model (LLM) agents and legacy algorithms aligned with organizational targets and values. HADA…

Artificial Intelligence · Computer Science 2025-06-06 Tapio Pitkäranta , Leena Pitkäranta

MATA: Multi-Agent Framework for Reliable and Flexible Table Question Answering

Recent advances in Large Language Models (LLMs) have significantly improved table understanding tasks such as Table Question Answering (TableQA), yet challenges remain in ensuring reliability, scalability, and efficiency, especially in…

Computation and Language · Computer Science 2026-04-22 Sieun Hyeon , Jusang Oh , Sunghwan Steve Cho , Jaeyoung Do

RPA-Check: A Multi-Stage Automated Framework for Evaluating Dynamic LLM-based Role-Playing Agents

The rapid adoption of Large Language Models (LLMs) in interactive systems has enabled the creation of dynamic, open-ended Role-Playing Agents (RPAs). However, evaluating these agents remains a significant challenge, as standard NLP metrics…

Computation and Language · Computer Science 2026-04-14 Riccardo Rosati , Edoardo Colucci , Massimiliano Bolognini , Adriano Mancini , Paolo Sernani

AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments

Evaluating large language models (LLM) in clinical scenarios is crucial to assessing their potential clinical utility. Existing benchmarks rely heavily on static question-answering, which does not accurately depict the complex, sequential…

Human-Computer Interaction · Computer Science 2025-05-27 Samuel Schmidgall , Rojin Ziaei , Carl Harris , Eduardo Reis , Jeffrey Jopling , Michael Moor