Related papers: API Pack: A Massive Multi-Programming Language Dat…

API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs

Recent research has demonstrated that Large Language Models (LLMs) can enhance their capabilities by utilizing external tools. However, three pivotal questions remain unanswered: (1) How effective are current LLMs in utilizing tools? (2)…

Computation and Language · Computer Science 2023-10-26 Minghao Li , Yingxiu Zhao , Bowen Yu , Feifan Song , Hangyu Li , Haiyang Yu , Zhoujun Li , Fei Huang , Yongbin Li

A Comprehensive Framework for Evaluating API-oriented Code Generation in Large Language Models

Large language models (LLMs) like GitHub Copilot and ChatGPT have emerged as powerful tools for code generation, significantly enhancing productivity and accelerating software development. However, existing benchmarks primarily focus on…

Software Engineering · Computer Science 2024-09-27 Yixi Wu , Pengfei He , Zehao Wang , Shaowei Wang , Yuan Tian , Tse-Hsun Chen

Beyond Text: Unveiling Multimodal Proficiency of Large Language Models with MultiAPI Benchmark

The proliferation of Large Language Models like ChatGPT has significantly advanced language understanding and generation, impacting a broad spectrum of applications. However, these models predominantly excel in text-based tasks, overlooking…

Computation and Language · Computer Science 2023-11-23 Xiao Liu , Jianfeng Lin , Jiawei Zhang

API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs

There is a growing need for Large Language Models (LLMs) to effectively use tools and external Application Programming Interfaces (APIs) to plan and complete tasks. As such, there is tremendous interest in methods that can acquire…

Computation and Language · Computer Science 2024-05-21 Kinjal Basu , Ibrahim Abdelaziz , Subhajit Chaudhury , Soham Dan , Maxwell Crouse , Asim Munawar , Sadhana Kumaravel , Vinod Muthusamy , Pavan Kapanipathi , Luis A. Lastras

APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

The advancement of function-calling agent models requires diverse, reliable, and high-quality datasets. This paper presents APIGen, an automated data generation pipeline designed to synthesize verifiable high-quality datasets for…

Computation and Language · Computer Science 2024-06-27 Zuxin Liu , Thai Hoang , Jianguo Zhang , Ming Zhu , Tian Lan , Shirley Kokane , Juntao Tan , Weiran Yao , Zhiwei Liu , Yihao Feng , Rithesh Murthy , Liangwei Yang , Silvio Savarese , Juan Carlos Niebles , Huan Wang , Shelby Heinecke , Caiming Xiong

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

Despite the advancements of open-source large language models (LLMs), e.g., LLaMA, they remain significantly limited in tool-use capabilities, i.e., using external tools (APIs) to fulfill human instructions. The reason is that current…

Artificial Intelligence · Computer Science 2023-10-04 Yujia Qin , Shihao Liang , Yining Ye , Kunlun Zhu , Lan Yan , Yaxi Lu , Yankai Lin , Xin Cong , Xiangru Tang , Bill Qian , Sihan Zhao , Lauren Hong , Runchu Tian , Ruobing Xie , Jie Zhou , Mark Gerstein , Dahai Li , Zhiyuan Liu , Maosong Sun

MedAlpaca -- An Open-Source Collection of Medical Conversational AI Models and Training Data

As large language models (LLMs) like OpenAI's GPT series continue to make strides, we witness the emergence of artificial intelligence applications in an ever-expanding range of fields. In medicine, these LLMs hold considerable promise for…

Computation and Language · Computer Science 2025-03-20 Tianyu Han , Lisa C. Adams , Jens-Michalis Papaioannou , Paul Grundmann , Tom Oberhauser , Alexei Figueroa , Alexander Löser , Daniel Truhn , Keno K. Bressem

Harnessing LLMs for API Interactions: A Framework for Classification and Synthetic Data Generation

As Large Language Models (LLMs) advance in natural language processing, there is growing interest in leveraging their capabilities to simplify software interactions. In this paper, we propose a novel system that integrates LLMs for both…

Computation and Language · Computer Science 2024-09-19 Chunliang Tao , Xiaojing Fan , Yahe Yang

OctoPack: Instruction Tuning Code Large Language Models

Finetuning large language models (LLMs) on instructions leads to vast performance improvements on natural language tasks. We apply instruction tuning using code, leveraging the natural structure of Git commits, which pair code changes with…

Computation and Language · Computer Science 2024-02-20 Niklas Muennighoff , Qian Liu , Armel Zebaze , Qinkai Zheng , Binyuan Hui , Terry Yue Zhuo , Swayam Singh , Xiangru Tang , Leandro von Werra , Shayne Longpre

Optimizing Large Language Models for OpenAPI Code Completion

Recent advancements in Large Language Models (LLMs) and their utilization in code generation tasks have significantly reshaped the field of software development. Despite the remarkable efficacy of code completion solutions in mainstream…

Software Engineering · Computer Science 2024-06-12 Bohdan Petryshyn , Mantas Lukoševičius

CodeScholar: Growing Idiomatic Code Examples

Programmers often search for usage examples for API methods. A tool that could generate realistic, idiomatic, and contextual usage examples for one or more APIs would be immensely beneficial to developers. Such a tool would relieve the need…

Software Engineering · Computer Science 2023-12-27 Manish Shetty , Koushik Sen , Ion Stoica

An Effective Data Creation Pipeline to Generate High-quality Financial Instruction Data for Large Language Model

At the beginning era of large language model, it is quite critical to generate a high-quality financial dataset to fine-tune a large language model for financial related tasks. Thus, this paper presents a carefully designed data creation…

Computation and Language · Computer Science 2023-08-04 Ziao Wang , Jianning Wang , Junda Wu , Xiaofeng Zhang

Enhancing Chat Language Models by Scaling High-quality Instructional Conversations

Fine-tuning on instruction data has been widely validated as an effective practice for implementing chat language models like ChatGPT. Scaling the diversity and quality of such data, although straightforward, stands a great chance of…

Computation and Language · Computer Science 2023-05-24 Ning Ding , Yulin Chen , Bokai Xu , Yujia Qin , Zhi Zheng , Shengding Hu , Zhiyuan Liu , Maosong Sun , Bowen Zhou

Training Data for Large Language Model

In 2022, with the release of ChatGPT, large-scale language models gained widespread attention. ChatGPT not only surpassed previous models in terms of parameters and the scale of its pretraining corpus but also achieved revolutionary…

Artificial Intelligence · Computer Science 2024-11-13 Yiming Ju , Huanhuan Ma

Language Models in Software Development Tasks: An Experimental Analysis of Energy and Accuracy

The use of generative AI-based coding assistants like ChatGPT and Github Copilot is a reality in contemporary software development. Many of these tools are provided as remote APIs. Using third-party APIs raises data privacy and security…

Software Engineering · Computer Science 2025-01-20 Negar Alizadeh , Boris Belchev , Nishant Saurabh , Patricia Kelbert , Fernando Castor

Jigsaw: Large Language Models meet Program Synthesis

Large pre-trained language models such as GPT-3, Codex, and Google's language model are now capable of generating code from natural language specifications of programmer intent. We view these developments with a mixture of optimism and…

Software Engineering · Computer Science 2021-12-07 Naman Jain , Skanda Vaidyanath , Arun Iyer , Nagarajan Natarajan , Suresh Parthasarathy , Sriram Rajamani , Rahul Sharma

An approach for API synthesis using large language models

APIs play a pivotal role in modern software development by enabling seamless communication and integration between various systems, applications, and services. Component-based API synthesis is a form of program synthesis that constructs an…

Software Engineering · Computer Science 2025-02-24 Hua Zhong , Shan Jiang , Sarfraz Khurshid

Evaluating and Mitigating Errors in LLM-Generated Web API Integrations

API integration is a cornerstone of our digital infrastructure, enabling software systems to connect and interact. However, as shown by many studies, writing or generating correct code to invoke APIs, particularly web APIs, is challenging.…

Software Engineering · Computer Science 2025-12-19 Daniel Maninger , Leon Chemnitz , Amir Molzam Sharifloo , Tushar Lamba , Jannis Brugger , Mira Mezini

The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation

We present The Vault, a dataset of high-quality code-text pairs in multiple programming languages for training large language models to understand and generate code. We present methods for thoroughly extracting samples that use both…

Computation and Language · Computer Science 2023-10-31 Dung Nguyen Manh , Nam Le Hai , Anh T. V. Dau , Anh Minh Nguyen , Khanh Nghiem , Jin Guo , Nghi D. Q. Bui

ToolCoder: Teach Code Generation Models to use API search tools

Automatically generating source code from natural language descriptions has been a growing field of research in recent years. However, current large-scale code generation models often encounter difficulties when selecting appropriate APIs…

Software Engineering · Computer Science 2023-09-12 Kechi Zhang , Huangzhao Zhang , Ge Li , Jia Li , Zhuo Li , Zhi Jin