Related papers: Adaptive GPU Resource Allocation for Multi-Agent C…

Efficient and Scalable Agentic AI with Heterogeneous Systems

AI agents are emerging as a dominant workload in a wide range of applications, promising to be the vehicle that delivers the promised benefits of AI to enterprises and consumers. Unlike conventional software or static inference, agentic…

Machine Learning · Computer Science 2025-07-29 Zain Asgar , Michelle Nguyen , Sachin Katti

Adaptive Orchestration for Large-Scale Inference on Heterogeneous Accelerator Systems Balancing Cost, Performance, and Resilience

The surge in generative AI workloads has created a need for scalable inference systems that can flexibly harness both GPUs and specialized accelerators while containing operational costs. This paper proposes a hardware-agnostic control loop…

Performance · Computer Science 2025-03-28 Yahav Biran , Imry Kissos

Multi-Agent Reinforcement Learning for Adaptive Resource Orchestration in Cloud-Native Clusters

This paper addresses the challenges of high resource dynamism and scheduling complexity in cloud-native database systems. It proposes an adaptive resource orchestration method based on multi-agent reinforcement learning. The method…

Machine Learning · Computer Science 2025-08-15 Guanzi Yao , Heyao Liu , Linyan Dai

Adaptive and Resource-efficient Agentic AI Systems for Mobile and Embedded Devices: A Survey

Foundation models have reshaped AI by unifying fragmented architectures into scalable backbones with multimodal reasoning and contextual adaptation. In parallel, the long-standing notion of AI agents, defined by the sensing-decision-action…

Machine Learning · Computer Science 2025-10-02 Sicong Liu , Weiye Wu , Xiangrui Xu , Teng Li , Bowen Pang , Bin Guo , Zhiwen Yu

Collaborative Multi-Agent Reinforcement Learning Approach for Elastic Cloud Resource Scaling

This paper addresses the challenges of rapid resource variation and highly uncertain task loads in cloud computing environments. It proposes an optimization method for elastic cloud resource scaling based on a multi-agent system. The method…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-02 Bruce Fang , Danyi Gao

Sustainable AIGC Workload Scheduling of Geo-Distributed Data Centers: A Multi-Agent Reinforcement Learning Approach

Recent breakthroughs in generative artificial intelligence have triggered a surge in demand for machine learning training, which poses significant cost burdens and environmental challenges due to its substantial energy consumption.…

Artificial Intelligence · Computer Science 2023-04-18 Siyue Zhang , Minrui Xu , Wei Yang Bryan Lim , Dusit Niyato

Toward Scalable VR-Cloud Gaming: An Attention-aware Adaptive Resource Allocation Framework for 6G Networks

Virtual Reality Cloud Gaming (VR-CG) represents a demanding class of immersive applications, requiring high bandwidth, ultra-low latency, and intelligent resource management to ensure optimal user experience. In this paper, we propose a…

Networking and Internet Architecture · Computer Science 2026-01-06 Gabriel Almeida , João Paulo Esper , Cleverson Nahum , Aldebaro Klautau , Kleber Vieira Cardoso

Nova: Real-Time Agentic Vision-Language Model Serving with Adaptive Cross-Stage Parallelization

This paper presents Nova, a real-time scheduling framework for serving agentic vision-language models (VLMs) on a single GPU with balanced per-request latency and overall request process throughput. Our design begins by enabling effective…

Operating Systems · Computer Science 2025-09-26 Yuhang Xu , Shengzhong Liu , Dong Zhang , Bingheng Yan , Fan Wu , Guihai Chen

Adaptive routing protocols for determining optimal paths in AI multi-agent systems: a priority- and learning-enhanced approach

As distributed artificial intelligence (AI) and multi-agent architectures grow increasingly complex, the need for adaptive, context-aware routing becomes paramount. This paper introduces an enhanced, adaptive routing algorithm tailored for…

Multiagent Systems · Computer Science 2025-03-12 Theodor Panayotov , Ivo Emanuilov

GOGH: Correlation-Guided Orchestration of GPUs in Heterogeneous Clusters

The growing demand for computational resources in machine learning has made efficient resource allocation a critical challenge, especially in heterogeneous hardware clusters where devices vary in capability, age, and energy efficiency.…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-20 Ahmad Raeisi , Mahdi Dolati , Sina Darabi , Sadegh Talebi , Patrick Eugster , Ahmad Khonsari

Efficient Collaborative Multi-Agent Deep Reinforcement Learning for Large-Scale Fleet Management

Large-scale online ride-sharing platforms have substantially transformed our lives by reallocating transportation resources to alleviate traffic congestion and promote transportation efficiency. An efficient fleet management strategy not…

Multiagent Systems · Computer Science 2019-12-03 Kaixiang Lin , Renyu Zhao , Zhe Xu , Jiayu Zhou

AgentServe: Algorithm-System Co-Design for Efficient Agentic AI Serving on a Consumer-Grade GPU

Large language models (LLMs) are increasingly deployed as AI agents that operate in short reasoning-action loops, interleaving model computation with external calls. Unlike traditional chat applications, these agentic workloads require…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-12 Yuning Zhang , Yan Yan , Nan Yang , Dong Yuan

MARS: Efficient, Adaptive Co-Scheduling for Heterogeneous Agentic Systems

Large language models (LLMs) are increasingly deployed as the execution core of autonomous agents rather than as standalone text generators. Agentic workloads induce a temporal shift from single-turn inference to multi-turn LLM-tool loops,…

Operating Systems · Computer Science 2026-05-01 Yifei Wang , Hancheng Ye , Yechen Xu , Cong Guo , Chiyue Wei , Qinsi Wang , Dongting Li , Tingjun Chen , Hai "Helen" Li , Danyang Zhuo , Yiran Chen

AI-based Resource Allocation: Reinforcement Learning for Adaptive Auto-scaling in Serverless Environments

Serverless computing has emerged as a compelling new paradigm of cloud computing models in recent years. It promises the user services at large scale and low cost while eliminating the need for infrastructure management. On cloud provider…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-01 Lucia Schuler , Somaya Jamil , Niklas Kühl

The Cost of Dynamic Reasoning: Demystifying AI Agents and Test-Time Scaling from an AI Infrastructure Perspective

Large-language-model (LLM)-based AI agents have recently showcased impressive versatility by employing dynamic reasoning, an adaptive, multi-step process that coordinates with external tools. This shift from static, single-turn inference to…

Machine Learning · Computer Science 2026-01-08 Jiin Kim , Byeongjun Shin , Jinha Chung , Minsoo Rhu

Autono: A ReAct-Based Highly Robust Autonomous Agent Framework

This paper proposes a highly robust autonomous agent framework based on the ReAct paradigm, designed to solve complex tasks through adaptive decision making and multi-agent collaboration. Unlike traditional frameworks that rely on fixed…

Multiagent Systems · Computer Science 2025-04-09 Zihao Wu

Adaptive-Solver Framework for Dynamic Strategy Selection in Large Language Model Reasoning

Large Language Models (LLMs) demonstrate impressive ability in handling reasoning tasks. However, unlike humans who can instinctively adapt their problem-solving strategies to the complexity of task, most LLM-based methods adopt a…

Computation and Language · Computer Science 2024-12-24 Jianpeng Zhou , Wanjun Zhong , Yanlin Wang , Jiahai Wang

Parallelism Meets Adaptiveness: Scalable Documents Understanding in Multi-Agent LLM Systems

Large language model (LLM) agents have shown increasing promise for collaborative task completion. However, existing multi-agent frameworks often rely on static workflows, fixed roles, and limited inter-agent communication, reducing their…

Multiagent Systems · Computer Science 2026-02-13 Chengxuan Xia , Qianye Wu , Sixuan Tian , Yilun Hao

Multi-Objective Task Assignment and Multiagent Planning with Hybrid GPU-CPU Acceleration

Allocation and planning with a collection of tasks and a group of agents is an important problem in multiagent systems. One commonly faced bottleneck is scalability, as in general the multiagent model increases exponentially in size with…

Multiagent Systems · Computer Science 2023-05-09 Thomas Robinson , Guoxin Su

Dynamic Strategy Adaptation in Multi-Agent Environments with Large Language Models

Large language models (LLMs) demonstrate strong reasoning abilities across mathematical, strategic, and linguistic tasks, yet little is known about how well they reason in dynamic, real-time, multi-agent scenarios, such as collaborative…

Multiagent Systems · Computer Science 2026-01-01 Shaurya Mallampati , Rashed Shelim , Walid Saad , Naren Ramakrishnan