English

Concurrency without Model Changes: Future-based Asynchronous Function Calling for LLMs

Computation and Language 2026-05-15 v1 Artificial Intelligence Machine Learning

Abstract

Function calling, also known as tool use, is a core capability of modern LLM agents but is typically constrained by synchronous execution semantics. Under these semantics, LLM decoding is blocked until each function call completes, resulting in increasing end-to-end latency. In this work, we introduce AsyncFC, a pure execution-layer framework that decouples LLM decoding from function execution, enabling overlap between model decoding and function execution as well as inter-function parallelism when dependencies permit. AsyncFC layers over existing models and unmodified function implementations, requiring no fine-tuning or changes to the standard synchronous function-calling protocol. Across standard function-calling benchmarks and adapted software engineering benchmarks, AsyncFC significantly reduces end-to-end task completion time while preserving task accuracy. Furthermore, these results reveal that LLMs possess a native capability to reason over symbolic futures that represent unresolved execution results, enabling an asynchronous paradigm for model-tool interaction.

Keywords

Cite

@article{arxiv.2605.15077,
  title  = {Concurrency without Model Changes: Future-based Asynchronous Function Calling for LLMs},
  author = {Guangyu Feng and Huanzhi Mao and Prabal Dutta and Joseph E. Gonzalez},
  journal= {arXiv preprint arXiv:2605.15077},
  year   = {2026}
}