Related papers: A Policy-Driven Runtime Layer for Agentic LLM Serv…

KVFlow: Efficient Prefix Caching for Accelerating LLM-Based Multi-Agent Workflows

Large language model (LLM) based agentic workflows have become a popular paradigm for coordinating multiple specialized agents to solve complex tasks. To improve serving efficiency, existing LLM systems employ prefix caching to reuse…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-11 Zaifeng Pan , Ajjkumar Patel , Zhengding Hu , Yipeng Shen , Yue Guan , Wan-Lu Li , Lianhui Qin , Yida Wang , Yufei Ding

Efficient LLM Serving for Agentic Workflows: A Data Systems Perspective

Agentic workflows are composed of sequences of interdependent Large Language Model (LLM) calls, and they have become a dominant workload in modern AI systems. These workflows exhibit extensive redundancy from overlapping prompts and…

Multiagent Systems · Computer Science 2026-03-18 Noppanat Wadlom , Junyi Shen , Yao Lu

Don't Break the Cache: An Evaluation of Prompt Caching for Long-Horizon Agentic Tasks

Recent advancements in Large Language Model (LLM) agents have enabled complex multi-turn agentic tasks requiring extensive tool calling, where conversations can span dozens of API calls with increasingly large context windows. However,…

Computation and Language · Computer Science 2026-02-03 Elias Lumer , Faheem Nizar , Akshaya Jangiti , Kevin Frank , Anmol Gulati , Mandar Phadate , Vamse Kumar Subbiah

A Systematic Study of LLM-Based Architectures for Automated Patching

Large language models (LLMs) have shown promise for automated patching, but their effectiveness depends strongly on how they are integrated into patching systems. While prior work explores prompting strategies and individual agent designs,…

Cryptography and Security · Computer Science 2026-03-03 Qingxiao Xu , Ze Sheng , Zhicheng Chen , Jeff Huang

Supporting Dynamic Agentic Workloads: How Data and Agents Interact

The rise of multi-agent systems powered by large language models (LLMs) and specialized reasoning agents exposes fundamental limitations in today's data management architectures. Traditional databases and data fabrics were designed for…

Multiagent Systems · Computer Science 2025-12-11 Ioana Giurgiu , Michael E. Nidd

Multi-Layer Scheduling for MoE-Based LLM Reasoning

Large Language Models (LLMs) have achieved remarkable success across a wide range of tasks, but serving them efficiently at scale remains a critical challenge due to their substantial computational and latency demands. While most existing…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-04 Yifan Sun , Gholamreza Haffari , Minxian Xu , Rajkumar Buyya , Adel N. Toosi

Agentic Plan Caching: Test-Time Memory for Fast and Cost-Efficient LLM Agents

LLM-based agent applications have shown increasingly remarkable capabilities in complex workflows but incur substantial costs and latency due to extensive planning and reasoning requirements. Existing LLM caching techniques (like context…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-28 Qizheng Zhang , Michael Wornow , Gerry Wan , Kunle Olukotun

Agent Memory Below the Prompt: Persistent Q4 KV Cache for Multi-Agent LLM Inference on Edge Devices

Multi-agent LLM systems on edge devices face a memory management problem: device RAM is too small to hold every agent's KV cache simultaneously. On Apple M4 Pro with 10.2 GB of cache budget, only 3 agents fit at 8K context in FP16. A…

Machine Learning · Computer Science 2026-03-06 Yakov Pyotr Shkolnikov

Multi-Agent Memory from a Computer Architecture Perspective: Visions and Challenges Ahead

As LLM agents evolve into collaborative multi-agent systems, their memory requirements grow rapidly in complexity. This position paper frames multi-agent memory as a computer architecture problem. We distinguish shared and distributed…

Hardware Architecture · Computer Science 2026-04-01 Zhongming Yu , Naicheng Yu , Hejia Zhang , Wentao Ni , Mingrui Yin , Jiaying Yang , Yujie Zhao , Jishen Zhao

AgentServe: Algorithm-System Co-Design for Efficient Agentic AI Serving on a Consumer-Grade GPU

Large language models (LLMs) are increasingly deployed as AI agents that operate in short reasoning-action loops, interleaving model computation with external calls. Unlike traditional chat applications, these agentic workloads require…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-12 Yuning Zhang , Yan Yan , Nan Yang , Dong Yuan

Agentic AI Workload Characteristics

Agentic AI shifts LLM serving from isolated prompt-generation requests to stateful, multi-turn executions that repeatedly invoke the model, call tools, and grow context over time. This paper characterizes ReAct-style agents from both the…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-27 Yichao Yuan , Ankita Nayak , Souvik Kundu , Nishil Talati

Compiling Agentic Workflows into LLM Weights: Near-Frontier Quality at Two Orders of Magnitude Less Cost

Agent orchestration frameworks have proliferated, collectively exceeding 290,000 GitHub stars across LangGraph, CrewAI, Google ADK, OpenAI Agents SDK, Semantic Kernel, Strands, and LlamaIndex. All follow the same pattern: an external…

Artificial Intelligence · Computer Science 2026-05-22 Simon Dennis , Rivaan Patil , Kevin Shabahang , Hao Guo

AgensFlow: A Coordination-Policy Substrate for Multi-Agent Systems

Multi-agent systems built on large language models (LLMs) require many coordination choices that are difficult to fix a priori: which skill protocol to invoke, which agent role should perform a subtask, which model to bind to each role, how…

Multiagent Systems · Computer Science 2026-05-28 Nicole Koenigstein

KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems

Multi-agent large language model (LLM) systems are increasingly adopted for complex language processing tasks that require communication and coordination among agents. However, these systems often suffer substantial overhead from repeated…

Multiagent Systems · Computer Science 2025-11-04 Hancheng Ye , Zhengqi Gao , Mingyuan Ma , Qinsi Wang , Yuzhe Fu , Ming-Yu Chung , Yueqian Lin , Zhijian Liu , Jianyi Zhang , Danyang Zhuo , Yiran Chen

What Limits Agentic Systems Efficiency?

Large Language Models (LLMs), such as OpenAI-o1 and DeepSeek-R1, have demonstrated strong reasoning capabilities. To further enhance LLM capabilities, recent agentic systems, such as Deep Research, incorporate web interactions into LLM…

Artificial Intelligence · Computer Science 2025-10-21 Song Bian , Minghao Yan , Anand Jayarajan , Gennady Pekhimenko , Shivaram Venkataraman

RelayCaching: Accelerating LLM Collaboration via Decoding KV Cache Reuse

The increasing complexity of AI tasks has shifted the paradigm from monolithic models toward multi-agent large language model (LLM) systems. However, these collaborative architectures introduce a critical bottleneck: redundant prefill…

Machine Learning · Computer Science 2026-03-17 Yingsheng Geng , Yuchong Gao , Weihong Wu , Guyue Liu , Jiang Liu

Rethinking the Value of Multi-Agent Workflow: A Strong Single Agent Baseline

Recent advances in LLM-based multi-agent systems (MAS) show that workflows composed of multiple LLM agents with distinct roles, tools, and communication patterns can outperform single-LLM baselines on complex tasks. However, most frameworks…

Multiagent Systems · Computer Science 2026-01-21 Jiawei Xu , Arief Koesdwiady , Sisong Bei , Yan Han , Baixiang Huang , Dakuo Wang , Yutong Chen , Zheshen Wang , Peihao Wang , Pan Li , Ying Ding

Hive: A Multi-Agent Infrastructure for Algorithm- and Task-Level Scaling

Large language models are increasingly deployed as complex agentic systems that scale with task complexity. While prior work has extensively explored model- and system-level scaling, algorithm- and task-level scaling remain largely…

Artificial Intelligence · Computer Science 2026-04-21 Zizhang Luo , Yuhao Luo , Youwei Xiao , Yansong Xu , Runlin Guo , Yun Liang

When LLMs Team Up: A Coordinated Attack Framework for Automated Cyber Intrusions

Automated intrusion-style workflows require LLM agents to reason over partial observations, tool outputs, and executable artifacts under bounded budgets. A single LLM instance often compresses evidence extraction, planning, execution, and…

Cryptography and Security · Computer Science 2026-05-12 Minfeng Qi , Tianqing Zhu , Zijie Xu , Congcong Zhu , Qin Wang , Wanlei Zhou

Knowledge Graphs as the Missing Data Layer for LLM-Based Industrial Asset Operations

LLM-based agents for industrial asset operations show limited accuracy when reasoning over flat document stores. AssetOpsBench (KDD 2026) establishes that GPT-4 agents achieve 65% on 139 industrial maintenance scenarios backed by CouchDB,…

Databases · Computer Science 2026-05-27 Madhulatha Mandarapu , Sandeep Kunkunuru