English
Related papers

Related papers: ToolFuzz -- Automated Agent Tool Testing

200 papers

LLM-based tool agents offer natural language interfaces, enabling users to seamlessly interact with computing services. While REST APIs are valuable resources for building such agents, they must first be transformed into AI-compatible…

Machine Learning · Computer Science 2025-01-29 Xinyi Ni , Qiuyang Wang , Yukun Zhang , Pengyu Hong

Augmenting large language models (LLMs) with external tools has emerged as a promising approach to extend their utility, enabling them to solve practical tasks. Previous methods manually parse tool documentation and create in-context…

Computation and Language · Computer Science 2025-03-05 Zhengliang Shi , Shen Gao , Lingyong Yan , Yue Feng , Xiuyi Chen , Zhumin Chen , Dawei Yin , Suzan Verberne , Zhaochun Ren

Evaluating Large Language Models (LLMs) is one of the most critical aspects of building a performant compound AI system. Since the output from LLMs propagate to downstream steps, identifying LLM errors is crucial to system performance. A…

Tool use has turned large language models (LLMs) into powerful agents that can perform complex multi-step tasks by dynamically utilising external software components. However, these tools must be implemented in advance by human developers,…

Computation and Language · Computer Science 2025-06-02 Georg Wölflein , Dyke Ferber , Daniel Truhn , Ognjen Arandjelović , Jakob Nikolas Kather

Large language model (LLM) agents rely on external tools to solve complex tasks, but real-world toolsets often contain redundant tools with overlapping names and descriptions, introducing ambiguity and reducing selection accuracy. LLMs also…

Computation and Language · Computer Science 2026-05-12 Marianne Menglin Liu , Daniel Garcia , Fjona Parllaku , Vikas Upadhyay , Syed Fahad Allam Shah , Dan Roth

Large Language Models (LLMs) are increasingly used to build autonomous agents that perform complex tasks with external tools, often exposed through APIs in enterprise systems. Direct use of these APIs is difficult due to the complex input…

Recent advances in Language Model (LM) agents and tool use, exemplified by applications like ChatGPT Plugins, enable a rich set of capabilities but also amplify potential risks - such as leaking private data or causing financial losses.…

Artificial Intelligence · Computer Science 2024-05-20 Yangjun Ruan , Honghua Dong , Andrew Wang , Silviu Pitis , Yongchao Zhou , Jimmy Ba , Yann Dubois , Chris J. Maddison , Tatsunori Hashimoto

Large Language Model (LLM) agents have developed rapidly in recent years to solve complex real-world problems using external tools. However, the scarcity of high-quality trajectories still hinders the development of stronger LLM agents.…

Artificial Intelligence · Computer Science 2025-12-08 Chen Yang , Ran Le , Yun Xing , Zhenwei An , Zongchao Chen , Wayne Xin Zhao , Yang Song , Tao Zhang

Tool-augmented large language models (LLMs) are increasingly employed in real-world applications, but tool usage errors still hinder their reliability. We introduce ToolCritic, a diagnostic framework that evaluates and improves LLM behavior…

Artificial Intelligence · Computer Science 2025-10-21 Hassan Hamad , Yingru Xu , Liang Zhao , Wenbo Yan , Narendra Gyanchandani

Multi-agent systems powered by large language models (LLMs) are transforming enterprise automation, yet systematic evaluation methodologies for assessing tool-use reliability remain underdeveloped. We introduce a comprehensive diagnostic…

Artificial Intelligence · Computer Science 2026-01-26 Donghao Huang , Gauri Malwe , Zhaoxia Wang

Equipped with the capability to call functions, modern large language models (LLMs) can leverage external tools for addressing a range of tasks unattainable through language skills alone. However, the effective execution of these tools…

Computation and Language · Computer Science 2026-04-30 Wenxuan Wang , Juluan Shi , Zixuan Ling , Yuk-Kit Chan , Chaozheng Wang , Cheryl Lee , Youliang Yuan , Jen-tse Huang , Wenxiang Jiao , Michael R. Lyu

Despite the advancements of open-source large language models (LLMs), e.g., LLaMA, they remain significantly limited in tool-use capabilities, i.e., using external tools (APIs) to fulfill human instructions. The reason is that current…

Recently, using Large Language Models (LLMs) to generate optimization models from natural language descriptions has became increasingly popular. However, a major open question is how to validate that the generated models are correct and…

Artificial Intelligence · Computer Science 2026-04-07 Alexander Zadorojniy , Segev Wasserkrug , Eitan Farchi

In recent years, machine learning (ML) based software systems are increasingly deployed in several critical applications, yet systematic testing of their behavior remains challenging due to complex model architectures, large input spaces,…

Software Engineering · Computer Science 2026-03-17 Fadel Mamar Seydou , Arnab Sharma

Recently, the astonishing performance of large language models (LLMs) in natural language comprehension and generation tasks triggered lots of exploration of using them as central controllers to build agent systems. Multiple studies focus…

Computer Vision and Pattern Recognition · Computer Science 2025-04-14 Chenyu Wang , Weixin Luo , Sixun Dong , Xiaohua Xuan , Zhengxin Li , Lin Ma , Shenghua Gao

While Large Language Models (LLMs) have evolved into tool-using agents, they remain brittle in long-horizon interactions. Unlike mathematical reasoning where errors are often rectifiable via backtracking, tool-use failures frequently induce…

Artificial Intelligence · Computer Science 2026-03-17 Shengda Fan , Xuyan Ye , Yupeng Huo , Zhi-Yuan Chen , Yiju Guo , Shenzhi Yang , Wenkai Yang , Shuqi Ye , Jingwen Chen , Haotian Chen , Xin Cong , Yankai Lin

As large language models (LLMs) advance, their inability to autonomously execute tasks by directly interacting with external tools remains a critical limitation. Traditional methods rely on inputting tool descriptions as context, which is…

Computation and Language · Computer Science 2025-04-01 Renxi Wang , Xudong Han , Lei Ji , Shu Wang , Timothy Baldwin , Haonan Li

Large Language Model (LLM) agents, which integrate planning, memory, reflection, and tool-use modules, have shown promise in solving complex, multi-step tasks. Yet their sophisticated architectures amplify vulnerability to cascading…

Tool-augmented Large Language Models (TaLLMs) extend LLMs with the ability to invoke external tools, enabling them to interact with real-world environments. However, a major limitation in deploying TaLLMs in sensitive applications such as…

Software Engineering · Computer Science 2026-03-24 Cailin Winston , Claris Winston , René Just

To address intricate real-world tasks, there has been a rising interest in tool utilization in applications of large language models (LLMs). To develop LLM-based agents, it usually requires LLMs to understand many tool functions from…

Computation and Language · Computer Science 2024-03-28 Siyu Yuan , Kaitao Song , Jiangjie Chen , Xu Tan , Yongliang Shen , Ren Kan , Dongsheng Li , Deqing Yang
‹ Prev 1 2 3 10 Next ›