English
Related papers

Related papers: SimpleTool: Parallel Decoding for Real-Time LLM Fu…

200 papers

Large language model (LLM)-based agents have shown strong capabilities in using external tools to solve complex tasks. However, existing evaluations often overlook the temporal dimension of tool use, especially the impact of tool response…

Artificial Intelligence · Computer Science 2026-05-29 Kou Shi , Ziao Zhang , Shiting Huang , Avery Nie , Zhen Fang , Qiuchen Wang , Lin Chen , Huaian Chen , Zehui Chen , Feng Zhao

Large Language Models (LLMs) for complex reasoning is often hindered by high computational costs and latency, while resource-efficient Small Language Models (SLMs) typically lack the necessary reasoning capacity. Existing collaborative…

Computation and Language · Computer Science 2026-01-09 Chengsong Huang , Tong Zheng , Langlin Huang , Jinyuan Li , Haolin Liu , Jiaxin Huang

State-of-the-art sequential reasoning in Large Language Models (LLMs) has expanded the capabilities of Copilots beyond conversational tasks to complex function calling, managing thousands of API calls. However, the tendency of compositional…

Programming Languages · Computer Science 2024-05-29 Simranjit Singh , Andreas Karatzas , Michael Fore , Iraklis Anagnostopoulos , Dimitrios Stamoulis

Function calling, also known as tool use, is a core capability of modern LLM agents but is typically constrained by synchronous execution semantics. Under these semantics, LLM decoding is blocked until each function call completes,…

Computation and Language · Computer Science 2026-05-15 Guangyu Feng , Huanzhi Mao , Prabal Dutta , Joseph E. Gonzalez

The reasoning capabilities of the recent LLMs enable them to execute external function calls to overcome their inherent limitations, such as knowledge cutoffs, poor arithmetic skills, or lack of access to private data. This development has…

Computation and Language · Computer Science 2024-06-06 Sehoon Kim , Suhong Moon , Ryan Tabrizi , Nicholas Lee , Michael W. Mahoney , Kurt Keutzer , Amir Gholami

As large language models (LLMs) advance, their inability to autonomously execute tasks by directly interacting with external tools remains a critical limitation. Traditional methods rely on inputting tool descriptions as context, which is…

Computation and Language · Computer Science 2025-04-01 Renxi Wang , Xudong Han , Lei Ji , Shu Wang , Timothy Baldwin , Haonan Li

While Large Language Models (LLMs) have achieved remarkable success in various fields, the efficiency of training and inference remains a major challenge. To address this issue, we propose SUBLLM, short for Subsampling-Upsampling-Bypass…

Computation and Language · Computer Science 2024-08-26 Quandong Wang , Yuxuan Yuan , Xiaoyu Yang , Ruike Zhang , Kang Zhao , Wei Liu , Jian Luan , Daniel Povey , Bin Wang

The auto-regressive decoding of Large Language Models (LLMs) results in significant overheads in their hardware performance. While recent research has investigated various speculative decoding techniques for multi-token generation, these…

Machine Learning · Computer Science 2025-10-01 Hao Mark Chen , Wayne Luk , Ka Fai Cedric Yiu , Rui Li , Konstantin Mishchenko , Stylianos I. Venieris , Hongxiang Fan

Diffusion large language models (dLLMs) have recently drawn considerable attention within the research community as a promising alternative to autoregressive generation, offering parallel token prediction and lower inference latency. Yet,…

Computation and Language · Computer Science 2025-10-01 Zigeng Chen , Gongfan Fang , Xinyin Ma , Ruonan Yu , Xinchao Wang

Efficient parallelization of Large Language Models (LLMs) with long sequences is essential but challenging due to their significant computational and memory demands, particularly stemming from communication bottlenecks in attention…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-12-31 Zongwu Wang , Fangxin Liu , Mingshuai Li , Li Jiang

Finetuning large language models (LLMs) is essential for task adaptation, yet today's serving stacks isolate inference and finetuning on separate GPU clusters -- wasting resources and under-utilizing hardware. We introduce FlexLLM, the…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-27 Gabriele Oliaro , Xupeng Miao , Xinhao Cheng , Vineeth Kada , Mengdi Wu , Ruohan Gao , Yingyi Huang , Remi Delacourt , April Yang , Yingcheng Wang , Colin Unger , Zhihao Jia

Large Language Models (LLM) show strong abilities in code generation, but their skill in creating efficient parallel programs is less studied. This paper explores how LLMs generate task-based parallel code from three kinds of input prompts:…

Programming Languages · Computer Science 2026-02-27 Linus Bantel , Moritz Strack , Alexander Strack , Dirk Pflüger

Large Language Models (LLMs) have exhibited exceptional performance across a spectrum of natural language processing tasks. However, their substantial sizes pose considerable challenges, particularly in computational demands and inference…

Computation and Language · Computer Science 2025-06-03 Guoxuan Chen , Han Shi , Jiawei Li , Yihang Gao , Xiaozhe Ren , Yimeng Chen , Xin Jiang , Zhenguo Li , Weiyang Liu , Chao Huang

Low-latency decoding for large language models (LLMs) is crucial for applications like chatbots and code assistants, yet generating long outputs remains slow in single-query settings. Prior work on speculative decoding (which combines a…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-16 Ziyi Zhang , Ziheng Jiang , Chengquan Jiang , Menghan Yu , Size Zheng , Haibin Lin , Henry Hoffmann , Xin Liu

Large Language Models (LLMs) have exhibited significant potential in performing diverse tasks, including the ability to call functions or use external tools to enhance their performance. While current research on function calling by LLMs…

Computation and Language · Computer Science 2025-03-04 Mingyang Chen , Haoze Sun , Tianpeng Li , Fan Yang , Hao Liang , Keer Lu , Bin Cui , Wentao Zhang , Zenan Zhou , Weipeng Chen

Tool-augmented large language models (LLMs), hereafter LLM agents, leverage external tools to solve diverse tasks and interface with the real world. However, current training practices largely rely on supervised fine-tuning (SFT) over…

Machine Learning · Computer Science 2026-03-18 Weihua Du , Hailei Gong , Zhan Ling , Kang Liu , Lingfeng Shen , Xuesong Yao , Yufei Xu , Dingyuan Shi , Yiming Yang , Jiecao Chen

Large language models (LLMs) have recently shown remarkable performance across a wide range of tasks. However, the substantial number of parameters in LLMs contributes to significant latency during model inference. This is particularly…

Computation and Language · Computer Science 2024-04-19 Pengfei Wu , Jiahao Liu , Zhuocheng Gong , Qifan Wang , Jinpeng Li , Jingang Wang , Xunliang Cai , Dongyan Zhao

The autoregressive nature of conventional large language models (LLMs) inherently limits inference speed, as tokens are generated sequentially. While speculative and parallel decoding techniques attempt to mitigate this, they face…

Artificial Intelligence · Computer Science 2024-10-22 Aishwarya P S , Pranav Ajit Nair , Yashas Samaga , Toby Boyd , Sanjiv Kumar , Prateek Jain , Praneeth Netrapalli

Tool-use capability is a fundamental component of LLM agents, enabling them to interact with external systems through structured function calls. However, existing research exhibits inconsistent interaction representations, largely overlooks…

Artificial Intelligence · Computer Science 2026-05-26 Yijuan Liang , Xinghao Chen , Yifan Ge , Ziyi Wu , Hao Wu , Changyu Zeng , Wei Xing , Xiaoyu Shen

Cost of serving large language models (LLM) is high, but the expensive and scarce GPUs are poorly efficient when generating tokens sequentially, unless the batch of sequences is enlarged. However, the batch size is limited by some…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-03-19 Jiaao He , Jidong Zhai
‹ Prev 1 2 3 10 Next ›