Related papers: Adaptive GPU Resource Allocation for Multi-Agent C…
AI agents are emerging as a dominant workload in a wide range of applications, promising to be the vehicle that delivers the promised benefits of AI to enterprises and consumers. Unlike conventional software or static inference, agentic…
The surge in generative AI workloads has created a need for scalable inference systems that can flexibly harness both GPUs and specialized accelerators while containing operational costs. This paper proposes a hardware-agnostic control loop…
This paper addresses the challenges of high resource dynamism and scheduling complexity in cloud-native database systems. It proposes an adaptive resource orchestration method based on multi-agent reinforcement learning. The method…
Foundation models have reshaped AI by unifying fragmented architectures into scalable backbones with multimodal reasoning and contextual adaptation. In parallel, the long-standing notion of AI agents, defined by the sensing-decision-action…
This paper addresses the challenges of rapid resource variation and highly uncertain task loads in cloud computing environments. It proposes an optimization method for elastic cloud resource scaling based on a multi-agent system. The method…
Recent breakthroughs in generative artificial intelligence have triggered a surge in demand for machine learning training, which poses significant cost burdens and environmental challenges due to its substantial energy consumption.…
Virtual Reality Cloud Gaming (VR-CG) represents a demanding class of immersive applications, requiring high bandwidth, ultra-low latency, and intelligent resource management to ensure optimal user experience. In this paper, we propose a…
This paper presents Nova, a real-time scheduling framework for serving agentic vision-language models (VLMs) on a single GPU with balanced per-request latency and overall request process throughput. Our design begins by enabling effective…
As distributed artificial intelligence (AI) and multi-agent architectures grow increasingly complex, the need for adaptive, context-aware routing becomes paramount. This paper introduces an enhanced, adaptive routing algorithm tailored for…
The growing demand for computational resources in machine learning has made efficient resource allocation a critical challenge, especially in heterogeneous hardware clusters where devices vary in capability, age, and energy efficiency.…
Large-scale online ride-sharing platforms have substantially transformed our lives by reallocating transportation resources to alleviate traffic congestion and promote transportation efficiency. An efficient fleet management strategy not…
Large language models (LLMs) are increasingly deployed as AI agents that operate in short reasoning-action loops, interleaving model computation with external calls. Unlike traditional chat applications, these agentic workloads require…
Large language models (LLMs) are increasingly deployed as the execution core of autonomous agents rather than as standalone text generators. Agentic workloads induce a temporal shift from single-turn inference to multi-turn LLM-tool loops,…
Serverless computing has emerged as a compelling new paradigm of cloud computing models in recent years. It promises the user services at large scale and low cost while eliminating the need for infrastructure management. On cloud provider…
Large-language-model (LLM)-based AI agents have recently showcased impressive versatility by employing dynamic reasoning, an adaptive, multi-step process that coordinates with external tools. This shift from static, single-turn inference to…
This paper proposes a highly robust autonomous agent framework based on the ReAct paradigm, designed to solve complex tasks through adaptive decision making and multi-agent collaboration. Unlike traditional frameworks that rely on fixed…
Large Language Models (LLMs) demonstrate impressive ability in handling reasoning tasks. However, unlike humans who can instinctively adapt their problem-solving strategies to the complexity of task, most LLM-based methods adopt a…
Large language model (LLM) agents have shown increasing promise for collaborative task completion. However, existing multi-agent frameworks often rely on static workflows, fixed roles, and limited inter-agent communication, reducing their…
Allocation and planning with a collection of tasks and a group of agents is an important problem in multiagent systems. One commonly faced bottleneck is scalability, as in general the multiagent model increases exponentially in size with…
Large language models (LLMs) demonstrate strong reasoning abilities across mathematical, strategic, and linguistic tasks, yet little is known about how well they reason in dynamic, real-time, multi-agent scenarios, such as collaborative…