Related papers: Asynchronous LLM Function Calling

Concurrency without Model Changes: Future-based Asynchronous Function Calling for LLMs

Function calling, also known as tool use, is a core capability of modern LLM agents but is typically constrained by synchronous execution semantics. Under these semantics, LLM decoding is blocked until each function call completes,…

Computation and Language · Computer Science 2026-05-15 Guangyu Feng , Huanzhi Mao , Prabal Dutta , Joseph E. Gonzalez

AsyncTool: Evaluating the Asynchronous Function Calling Capability under Multi-Task Scenarios

Large language model (LLM)-based agents have shown strong capabilities in using external tools to solve complex tasks. However, existing evaluations often overlook the temporal dimension of tool use, especially the impact of tool response…

Artificial Intelligence · Computer Science 2026-05-29 Kou Shi , Ziao Zhang , Shiting Huang , Avery Nie , Zhen Fang , Qiuchen Wang , Lin Chen , Huaian Chen , Zehui Chen , Feng Zhao

An LLM Compiler for Parallel Function Calling

The reasoning capabilities of the recent LLMs enable them to execute external function calls to overcome their inherent limitations, such as knowledge cutoffs, poor arithmetic skills, or lack of access to private data. This development has…

Computation and Language · Computer Science 2024-06-06 Sehoon Kim , Suhong Moon , Ryan Tabrizi , Nicholas Lee , Michael W. Mahoney , Kurt Keutzer , Amir Gholami

AsyncMLD: Asynchronous Multi-LLM Framework for Dialogue Recommendation System

We have reached a practical and realistic phase in human-support dialogue agents by developing a large language model (LLM). However, when requiring expert knowledge or anticipating the utterance content using the massive size of the…

Human-Computer Interaction · Computer Science 2023-12-22 Naoki Yoshimaru , Motoharu Okuma , Takamasa Iio , Kenji Hatano

ScaleLLM: A Resource-Frugal LLM Serving Framework by Optimizing End-to-End Efficiency

Large language models (LLMs) have surged in popularity and are extensively used in commercial applications, where the efficiency of model serving is crucial for the user experience. Most current research focuses on optimizing individual…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-12 Yuhang Yao , Han Jin , Alay Dilipbhai Shah , Shanshan Han , Zijian Hu , Yide Ran , Dimitris Stripelis , Zhaozhuo Xu , Salman Avestimehr , Chaoyang He

Asynchronous Large Language Model Enhanced Planner for Autonomous Driving

Despite real-time planners exhibiting remarkable performance in autonomous driving, the growing exploration of Large Language Models (LLMs) has opened avenues for enhancing the interpretability and controllability of motion planning.…

Robotics · Computer Science 2024-07-25 Yuan Chen , Zi-han Ding , Ziqin Wang , Yan Wang , Lijun Zhang , Si Liu

Efficient Function Orchestration for Large Language Models

Function calling is a fundamental capability of today's large language models, but sequential function calling posed efficiency problems. Recent studies have proposed to request function calls with parallelism support in order to alleviate…

Software Engineering · Computer Science 2025-10-30 Xiaoxia Liu , Peng Di , Cong Li , Jun Sun , Jingyi Wang

ADC: Enhancing Function Calling Via Adversarial Datasets and Code Line-Level Feedback

Large Language Models (LLMs) have made significant strides in Natural Language Processing and coding, yet they struggle with robustness and accuracy in complex function calls. To tackle these challenges, this paper introduces ADC, an…

Software Engineering · Computer Science 2024-12-30 Wei Zhang , Yi Zhang , Li Zhu , Qianghuai Jia , Feijun Jiang , Hongcheng Guo , Zhoujun Li , Mengping Zhou

Improving the End-to-End Efficiency of Offline Inference for Multi-LLM Applications Based on Sampling and Simulation

As large language models (LLMs) have shown great success in many tasks, they are used in various applications. While a lot of works have focused on the efficiency of single-LLM application (e.g., offloading, request scheduling, parallelism…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-24 Jingzhi Fang , Yanyan Shen , Yue Wang , Lei Chen

Fast Inference for Augmented Large Language Models

Augmented Large Language Models (LLMs) enhance the capabilities of standalone LLMs by integrating external data sources through API calls. In interactive LLM applications, efficient scheduling is crucial for maintaining low request…

Machine Learning · Computer Science 2024-10-29 Rana Shahout , Cong Liang , Shiji Xin , Qianru Lao , Yong Cui , Minlan Yu , Michael Mitzenmacher

AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training

Reinforcement learning (RL) has become a pivotal technology in the post-training phase of large language models (LLMs). Traditional task-colocated RL frameworks suffer from significant scalability bottlenecks, while task-separated RL…

Machine Learning · Computer Science 2025-07-03 Zhenyu Han , Ansheng You , Haibo Wang , Kui Luo , Guang Yang , Wenqi Shi , Menglong Chen , Sicheng Zhang , Zeshun Lan , Chunshi Deng , Huazhong Ji , Wenjie Liu , Yu Huang , Yixiang Zhang , Chenyi Pan , Jing Wang , Xin Huang , Chunsheng Li , Jianping Wu

Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline

Large language models (LLMs) have revolutionized the field of AI, demonstrating unprecedented capacity across various tasks. However, the inference process for LLMs comes with significant computational costs. In this paper, we propose an…

Computation and Language · Computer Science 2023-05-30 Zangwei Zheng , Xiaozhe Ren , Fuzhao Xue , Yang Luo , Xin Jiang , Yang You

An LLM-Tool Compiler for Fused Parallel Function Calling

State-of-the-art sequential reasoning in Large Language Models (LLMs) has expanded the capabilities of Copilots beyond conversational tasks to complex function calling, managing thousands of API calls. However, the tendency of compositional…

Programming Languages · Computer Science 2024-05-29 Simranjit Singh , Andreas Karatzas , Michael Fore , Iraklis Anagnostopoulos , Dimitrios Stamoulis

Optimizing Sequential Multi-Step Tasks with Parallel LLM Agents

Large language model (LLM)-based multi-agent systems have demonstrated remarkable promise for tackling complex tasks by breaking them down into subtasks that are iteratively planned, executed, observed, and refined. Despite their…

Multiagent Systems · Computer Science 2025-07-15 Enhao Zhang , Erkang Zhu , Gagan Bansal , Adam Fourney , Hussein Mozannar , Jack Gerrits

SplitLLM: Collaborative Inference of LLMs for Model Placement and Throughput Optimization

Large language models (LLMs) have been a disruptive innovation in recent years, and they play a crucial role in our daily lives due to their ability to understand and generate human-like text. Their capabilities include natural language…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-17 Akrit Mudvari , Yuang Jiang , Leandros Tassiulas

LAST SToP For Modeling Asynchronous Time Series

We present a novel prompt design for Large Language Models (LLMs) tailored to Asynchronous Time Series. Unlike regular time series, which assume values at evenly spaced time points, asynchronous time series consist of timestamped events…

Machine Learning · Computer Science 2025-02-05 Shubham Gupta , Thibaut Durand , Graham Taylor , Lilian W. Białokozowicz

Asynchronous Tool Usage for Real-Time Agents

While frontier large language models (LLMs) are capable tool-using agents, current AI systems still operate in a strict turn-based fashion, oblivious to passage of time. This synchronous design forces user queries and tool-use to occur…

Artificial Intelligence · Computer Science 2024-10-30 Antonio A. Ginart , Naveen Kodali , Jason Lee , Caiming Xiong , Silvio Savarese , John Emmons

Llumnix: Dynamic Scheduling for Large Language Model Serving

Inference serving for large language models (LLMs) is the key to unleashing their potential in people's daily lives. However, efficient LLM serving remains challenging today because the requests are inherently heterogeneous and…

Hardware Architecture · Computer Science 2024-06-07 Biao Sun , Ziming Huang , Hanyu Zhao , Wencong Xiao , Xinyi Zhang , Yong Li , Wei Lin

Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks

Recent advancements in Large Language Models (LLMs) have demonstrated exceptional capabilities in natural language understanding and generation. While these models excel in general complex reasoning tasks, they still face challenges in…

Artificial Intelligence · Computer Science 2024-10-25 Graziano A. Manduzio , Federico A. Galatolo , Mario G. C. A. Cimino , Enzo Pasquale Scilingo , Lorenzo Cominelli

SLO-Aware Scheduling for Large Language Model Inferences

Large language models (LLMs) have revolutionized applications such as code completion, chatbots, and online classification. To elevate user experiences, service level objectives (SLOs) serve as crucial benchmarks for assessing inference…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-13 Jinqi Huang , Yi Xiong , Xuebing Yu , Wenjie Huang , Entong Li , Li Zeng , Xin Chen