Related papers: Efficient Function Orchestration for Large Languag…

Asynchronous LLM Function Calling

Large language models (LLMs) use function calls to interface with external tools and data source. However, the current approach to LLM function calling is inherently synchronous, where each call blocks LLM inference, limiting LLM operation…

Computation and Language · Computer Science 2024-12-11 In Gim , Seung-seob Lee , Lin Zhong

An LLM Compiler for Parallel Function Calling

The reasoning capabilities of the recent LLMs enable them to execute external function calls to overcome their inherent limitations, such as knowledge cutoffs, poor arithmetic skills, or lack of access to private data. This development has…

Computation and Language · Computer Science 2024-06-06 Sehoon Kim , Suhong Moon , Ryan Tabrizi , Nicholas Lee , Michael W. Mahoney , Kurt Keutzer , Amir Gholami

Multi-LLM Orchestration for High-Quality Code Generation: Exploiting Complementary Model Strengths

Large Language Models (LLMs) have become central to automated code generation, yet existing approaches operate within a single-LLM paradigm: one model is selected and applied throughout the entire generation process. We observe that…

Software Engineering · Computer Science 2026-04-21 Huashan Chen , Zhenyu Qi , Haotang Li , Hong Chen , Jinfu Chen , Kebin Peng , In Kee Kim , Kyu Hyung Lee , Sen He , Weiyi Shang

Efficient Distributed MLLM Training with Cornstarch

Multimodal large language models (MLLMs) extend the capabilities of large language models (LLMs) by combining heterogeneous model architectures to handle diverse modalities like images and audio. However, this inherent heterogeneity in MLLM…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-26 Insu Jang , Runyu Lu , Nikhil Bansal , Ang Chen , Mosharaf Chowdhury

Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks

Recent advancements in Large Language Models (LLMs) have demonstrated exceptional capabilities in natural language understanding and generation. While these models excel in general complex reasoning tasks, they still face challenges in…

Artificial Intelligence · Computer Science 2024-10-25 Graziano A. Manduzio , Federico A. Galatolo , Mario G. C. A. Cimino , Enzo Pasquale Scilingo , Lorenzo Cominelli

An LLM-Tool Compiler for Fused Parallel Function Calling

State-of-the-art sequential reasoning in Large Language Models (LLMs) has expanded the capabilities of Copilots beyond conversational tasks to complex function calling, managing thousands of API calls. However, the tendency of compositional…

Programming Languages · Computer Science 2024-05-29 Simranjit Singh , Andreas Karatzas , Michael Fore , Iraklis Anagnostopoulos , Dimitrios Stamoulis

OrchMLLM: Orchestrate Multimodal Data with Batch Post-Balancing to Accelerate Multimodal Large Language Model Training

Multimodal large language models (MLLMs), such as GPT-4o, are garnering significant attention. During the exploration of MLLM training, we identified Modality Composition Incoherence, a phenomenon that the proportion of a certain modality…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-13 Yijie Zheng , Bangjun Xiao , Lei Shi , Xiaoyang Li , Faming Wu , Tianyu Li , Xuefeng Xiao , Yang Zhang , Yuxuan Wang , Shouda Liu

Hierarchical Memory Management for Mutable State

It is well known that modern functional programming languages are naturally amenable to parallel programming. Achieving efficient parallelism using functional languages, however, remains difficult. Perhaps the most important reason for this…

Programming Languages · Computer Science 2018-02-20 Adrien Guatto , Sam Westrick , Ram Raghunathan , Umut Acar , Matthew Fluet

AdaptOrch: Task-Adaptive Multi-Agent Orchestration in the Era of LLM Performance Convergence

As large language models from diverse providers converge toward comparable benchmark performance, the traditional paradigm of selecting a single best model per task yields diminishing returns. We argue that orchestration topology -- the…

Multiagent Systems · Computer Science 2026-02-20 Geunbin Yu

SGLang: Efficient Execution of Structured Language Model Programs

Large language models (LLMs) are increasingly used for complex tasks that require multiple generation calls, advanced prompting techniques, control flow, and structured inputs/outputs. However, efficient systems are lacking for programming…

Artificial Intelligence · Computer Science 2024-06-07 Lianmin Zheng , Liangsheng Yin , Zhiqiang Xie , Chuyue Sun , Jeff Huang , Cody Hao Yu , Shiyi Cao , Christos Kozyrakis , Ion Stoica , Joseph E. Gonzalez , Clark Barrett , Ying Sheng

ELHPlan: Efficient Long-Horizon Task Planning for Multi-Agent Collaboration

Large Language Models (LLMs) enable intelligent multi-robot collaboration but face fundamental trade-offs: open-loop methods that compile tasks into formal representations for external executors produce sound plans but lack adaptability in…

Artificial Intelligence · Computer Science 2026-03-10 Shaobin Ling , Yun Wang , Chenyou Fan , Tin Lun Lam , Junjie Hu

Less is More: Optimizing Function Calling for LLM Execution on Edge Devices

The advanced function-calling capabilities of foundation models open up new possibilities for deploying agents to perform complex API tasks. However, managing large amounts of data and interacting with numerous APIs makes function calling…

Performance · Computer Science 2024-11-26 Varatheepan Paramanayakam , Andreas Karatzas , Iraklis Anagnostopoulos , Dimitrios Stamoulis

A Recipe of Parallel Corpora Exploitation for Multilingual Large Language Models

Recent studies have highlighted the potential of exploiting parallel corpora to enhance multilingual large language models, improving performance in both bilingual tasks, e.g., machine translation, and general-purpose tasks, e.g., text…

Computation and Language · Computer Science 2025-02-11 Peiqin Lin , André F. T. Martins , Hinrich Schütze

APPL: A Prompt Programming Language for Harmonious Integration of Programs and Large Language Model Prompts

Large Language Models (LLMs) have become increasingly capable of handling diverse tasks with the aid of well-crafted prompts and integration of external tools, but as task complexity rises, the workflow involving LLMs can be complicated and…

Artificial Intelligence · Computer Science 2024-06-21 Honghua Dong , Qidong Su , Yubo Gao , Zhaoyu Li , Yangjun Ruan , Gennady Pekhimenko , Chris J. Maddison , Xujie Si

Connecting Large Language Model Agent to High Performance Computing Resource

The Large Language Model agent workflow enables the LLM to invoke tool functions to increase the performance on specific scientific domain questions. To tackle large scale of scientific research, it requires access to computing resource and…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-19 Heng Ma , Alexander Brace , Carlo Siebenschuh , Greg Pauloski , Ian Foster , Arvind Ramanathan

Concurrency without Model Changes: Future-based Asynchronous Function Calling for LLMs

Function calling, also known as tool use, is a core capability of modern LLM agents but is typically constrained by synchronous execution semantics. Under these semantics, LLM decoding is blocked until each function call completes,…

Computation and Language · Computer Science 2026-05-15 Guangyu Feng , Huanzhi Mao , Prabal Dutta , Joseph E. Gonzalez

From Prompts to Performance: Evaluating LLMs for Task-based Parallel Code Generation

Large Language Models (LLM) show strong abilities in code generation, but their skill in creating efficient parallel programs is less studied. This paper explores how LLMs generate task-based parallel code from three kinds of input prompts:…

Programming Languages · Computer Science 2026-02-27 Linus Bantel , Moritz Strack , Alexander Strack , Dirk Pflüger

Small Models, Big Tasks: An Exploratory Empirical Study on Small Language Models for Function Calling

Function calling is a complex task with widespread applications in domains such as information retrieval, software engineering and automation. For example, a query to book the shortest flight from New York to London on January 15 requires…

Artificial Intelligence · Computer Science 2025-04-29 Ishan Kavathekar , Raghav Donakanti , Ponnurangam Kumaraguru , Karthik Vaidhyanathan

LeMix: Unified Scheduling for LLM Training and Inference on Multi-GPU Systems

Modern deployment of large language models (LLMs) frequently involves both inference serving and continuous retraining to stay aligned with evolving data and user feedback. Common practices separate these workloads onto distinct servers in…

Artificial Intelligence · Computer Science 2025-07-30 Yufei Li , Zexin Li , Yinglun Zhu , Cong Liu

ElasticMM: Efficient Multimodal LLMs Serving with Elastic Multimodal Parallelism

Multimodal large language models (MLLMs) extend LLMs to handle images, videos, and audio by incorporating feature extractors and projection modules. However, these additional components -- combined with complex inference pipelines and…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-12 Zedong Liu , Shenggan Cheng , Guangming Tan , Yang You , Dingwen Tao