Related papers: ToolFuzz -- Automated Agent Tool Testing

ToolFactory: Automating Tool Generation by Leveraging LLM to Understand REST API Documentations

LLM-based tool agents offer natural language interfaces, enabling users to seamlessly interact with computing services. While REST APIs are valuable resources for building such agents, they must first be transformed into AI-compatible…

Machine Learning · Computer Science 2025-01-29 Xinyi Ni , Qiuyang Wang , Yukun Zhang , Pengyu Hong

Tool Learning in the Wild: Empowering Language Models as Automatic Tool Agents

Augmenting large language models (LLMs) with external tools has emerged as a promising approach to extend their utility, enabling them to solve practical tasks. Previous methods manually parse tool documentation and create in-context…

Computation and Language · Computer Science 2025-03-05 Zhengliang Shi , Shen Gao , Lingyong Yan , Yue Feng , Xiuyi Chen , Zhumin Chen , Dawei Yin , Suzan Verberne , Zhaochun Ren

ToolScan: A Benchmark for Characterizing Errors in Tool-Use LLMs

Evaluating Large Language Models (LLMs) is one of the most critical aspects of building a performant compound AI system. Since the output from LLMs propagate to downstream steps, identifying LLM errors is crucial to system performance. A…

Software Engineering · Computer Science 2025-06-27 Shirley Kokane , Ming Zhu , Tulika Awalgaonkar , Jianguo Zhang , Thai Hoang , Akshara Prabhakar , Zuxin Liu , Tian Lan , Liangwei Yang , Juntao Tan , Rithesh Murthy , Weiran Yao , Zhiwei Liu , Juan Carlos Niebles , Huan Wang , Shelby Heinecke , Caiming Xiong , Silivo Savarese

LLM Agents Making Agent Tools

Tool use has turned large language models (LLMs) into powerful agents that can perform complex multi-step tasks by dynamically utilising external software components. However, these tools must be implemented in advance by human developers,…

Computation and Language · Computer Science 2025-06-02 Georg Wölflein , Dyke Ferber , Daniel Truhn , Ognjen Arandjelović , Jakob Nikolas Kather

ToolScope: Enhancing LLM Agent Tool Use through Tool Merging and Context-Aware Filtering

Large language model (LLM) agents rely on external tools to solve complex tasks, but real-world toolsets often contain redundant tools with overlapping names and descriptions, introducing ambiguity and reducing selection accuracy. LLMs also…

Computation and Language · Computer Science 2026-05-12 Marianne Menglin Liu , Daniel Garcia , Fjona Parllaku , Vikas Upadhyay , Syed Fahad Allam Shah , Dan Roth

A Framework for Testing and Adapting REST APIs as LLM Tools

Large Language Models (LLMs) are increasingly used to build autonomous agents that perform complex tasks with external tools, often exposed through APIs in enterprise systems. Direct use of these APIs is difficult due to the complex input…

Software Engineering · Computer Science 2025-09-15 Jayachandu Bandlamudi , Ritwik Chaudhuri , Neelamadhav Gantayat , Sambit Ghosh , Kushal Mukherjee , Prerna Agarwal , Renuka Sindhgatta , Sameep Mehta

Identifying the Risks of LM Agents with an LM-Emulated Sandbox

Recent advances in Language Model (LM) agents and tool use, exemplified by applications like ChatGPT Plugins, enable a rich set of capabilities but also amplify potential risks - such as leaking private data or causing financial losses.…

Artificial Intelligence · Computer Science 2024-05-20 Yangjun Ruan , Honghua Dong , Andrew Wang , Silviu Pitis , Yongchao Zhou , Jimmy Ba , Yann Dubois , Chris J. Maddison , Tatsunori Hashimoto

ToolMind Technical Report: A Large-Scale, Reasoning-Enhanced Tool-Use Dataset

Large Language Model (LLM) agents have developed rapidly in recent years to solve complex real-world problems using external tools. However, the scarcity of high-quality trajectories still hinders the development of stronger LLM agents.…

Artificial Intelligence · Computer Science 2025-12-08 Chen Yang , Ran Le , Yun Xing , Zhenwei An , Zongchao Chen , Wayne Xin Zhao , Yang Song , Tao Zhang

ToolCritic: Detecting and Correcting Tool-Use Errors in Dialogue Systems

Tool-augmented large language models (LLMs) are increasingly employed in real-world applications, but tool usage errors still hinder their reliability. We introduce ToolCritic, a diagnostic framework that evaluates and improves LLM behavior…

Artificial Intelligence · Computer Science 2025-10-21 Hassan Hamad , Yingru Xu , Liang Zhao , Wenbo Yan , Narendra Gyanchandani

When Agents Fail to Act: A Diagnostic Framework for Tool Invocation Reliability in Multi-Agent LLM Systems

Multi-agent systems powered by large language models (LLMs) are transforming enterprise automation, yet systematic evaluation methodologies for assessing tool-use reliability remain underdeveloped. We introduce a comprehensive diagnostic…

Artificial Intelligence · Computer Science 2026-01-26 Donghao Huang , Gauri Malwe , Zhaoxia Wang

Learning to Ask: When LLM Agents Meet Unclear Instruction

Equipped with the capability to call functions, modern large language models (LLMs) can leverage external tools for addressing a range of tasks unattainable through language skills alone. However, the effective execution of these tools…

Computation and Language · Computer Science 2026-04-30 Wenxuan Wang , Juluan Shi , Zixuan Ling , Yuk-Kit Chan , Chaozheng Wang , Cheryl Lee , Youliang Yuan , Jen-tse Huang , Wenxiang Jiao , Michael R. Lyu

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

Despite the advancements of open-source large language models (LLMs), e.g., LLaMA, they remain significantly limited in tool-use capabilities, i.e., using external tools (APIs) to fulfill human instructions. The reason is that current…

Artificial Intelligence · Computer Science 2023-10-04 Yujia Qin , Shihao Liang , Yining Ye , Kunlun Zhu , Lan Yan , Yaxi Lu , Yankai Lin , Xin Cong , Xiangru Tang , Bill Qian , Sihan Zhao , Lauren Hong , Runchu Tian , Ruobing Xie , Jie Zhou , Mark Gerstein , Dahai Li , Zhiyuan Liu , Maosong Sun

An Agent-Based Framework for the Automatic Validation of Mathematical Optimization Models

Recently, using Large Language Models (LLMs) to generate optimization models from natural language descriptions has became increasingly popular. However, a major open question is how to validate that the generated models are correct and…

Artificial Intelligence · Computer Science 2026-04-07 Alexander Zadorojniy , Segev Wasserkrug , Eitan Farchi

DeepFix: Debugging and Fixing Machine Learning Workflow using Agentic AI

In recent years, machine learning (ML) based software systems are increasingly deployed in several critical applications, yet systematic testing of their behavior remains challenging due to complex model architectures, large input spaces,…

Software Engineering · Computer Science 2026-03-17 Fadel Mamar Seydou , Arnab Sharma

MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning

Recently, the astonishing performance of large language models (LLMs) in natural language comprehension and generation tasks triggered lots of exploration of using them as central controllers to build agent systems. Multiple studies focus…

Computer Vision and Pattern Recognition · Computer Science 2025-04-14 Chenyu Wang , Weixin Luo , Sixun Dong , Xiaohua Xuan , Zhengxin Li , Lin Ma , Shenghua Gao

AgentProcessBench: Diagnosing Step-Level Process Quality in Tool-Using Agents

While Large Language Models (LLMs) have evolved into tool-using agents, they remain brittle in long-horizon interactions. Unlike mathematical reasoning where errors are often rectifiable via backtracking, tool-use failures frequently induce…

Artificial Intelligence · Computer Science 2026-03-17 Shengda Fan , Xuyan Ye , Yupeng Huo , Zhi-Yuan Chen , Yiju Guo , Shenzhi Yang , Wenkai Yang , Shuqi Ye , Jingwen Chen , Haotian Chen , Xin Cong , Yankai Lin

ToolGen: Unified Tool Retrieval and Calling via Generation

As large language models (LLMs) advance, their inability to autonomously execute tasks by directly interacting with external tools remains a critical limitation. Traditional methods rely on inputting tool descriptions as context, which is…

Computation and Language · Computer Science 2025-04-01 Renxi Wang , Xudong Han , Lei Ji , Shu Wang , Timothy Baldwin , Haonan Li

Where LLM Agents Fail and How They can Learn From Failures

Large Language Model (LLM) agents, which integrate planning, memory, reflection, and tool-use modules, have shown promise in solving complex, multi-step tasks. Yet their sophisticated architectures amplify vulnerability to cascading…

Artificial Intelligence · Computer Science 2025-10-01 Kunlun Zhu , Zijia Liu , Bingxuan Li , Muxin Tian , Yingxuan Yang , Jiaxun Zhang , Pengrui Han , Qipeng Xie , Fuyang Cui , Weijia Zhang , Xiaoteng Ma , Xiaodong Yu , Gowtham Ramesh , Jialian Wu , Zicheng Liu , Pan Lu , James Zou , Jiaxuan You

Solver-Aided Verification of Policy Compliance in Tool-Augmented LLM Agents

Tool-augmented Large Language Models (TaLLMs) extend LLMs with the ability to invoke external tools, enabling them to interact with real-world environments. However, a major limitation in deploying TaLLMs in sensitive applications such as…

Software Engineering · Computer Science 2026-03-24 Cailin Winston , Claris Winston , René Just

EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction

To address intricate real-world tasks, there has been a rising interest in tool utilization in applications of large language models (LLMs). To develop LLM-based agents, it usually requires LLMs to understand many tool functions from…

Computation and Language · Computer Science 2024-03-28 Siyu Yuan , Kaitao Song , Jiangjie Chen , Xu Tan , Yongliang Shen , Ren Kan , Dongsheng Li , Deqing Yang