Related papers: Evaluating and Mitigating Errors in LLM-Generated …

APITestGenie: Generating Web API Tests from Requirements and API Specifications with LLMs

Modern software systems rely heavily on Web APIs, yet creating meaningful and executable test scripts remains a largely manual, time-consuming, and error-prone task. In this paper, we present APITestGenie, a novel tool that leverages Large…

Software Engineering · Computer Science 2026-04-03 André Pereira , Bruno Lima , João Pascoal Faria

A Comprehensive Framework for Evaluating API-oriented Code Generation in Large Language Models

Large language models (LLMs) like GitHub Copilot and ChatGPT have emerged as powerful tools for code generation, significantly enhancing productivity and accelerating software development. However, existing benchmarks primarily focus on…

Software Engineering · Computer Science 2024-09-27 Yixi Wu , Pengfei He , Zehao Wang , Shaowei Wang , Yuan Tian , Tse-Hsun Chen

APITestGenie: Automated API Test Generation through Generative AI

Intelligent assistants powered by Large Language Models (LLMs) can generate program and test code with high accuracy, boosting developers' and testers' productivity. However, there is a lack of studies exploring LLMs for testing Web APIs,…

Software Engineering · Computer Science 2024-09-09 André Pereira , Bruno Lima , João Pascoal Faria

An approach for API synthesis using large language models

APIs play a pivotal role in modern software development by enabling seamless communication and integration between various systems, applications, and services. Component-based API synthesis is a form of program synthesis that constructs an…

Software Engineering · Computer Science 2025-02-24 Hua Zhong , Shan Jiang , Sarfraz Khurshid

API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs

There is a growing need for Large Language Models (LLMs) to effectively use tools and external Application Programming Interfaces (APIs) to plan and complete tasks. As such, there is tremendous interest in methods that can acquire…

Computation and Language · Computer Science 2024-05-21 Kinjal Basu , Ibrahim Abdelaziz , Subhajit Chaudhury , Soham Dan , Maxwell Crouse , Asim Munawar , Sadhana Kumaravel , Vinod Muthusamy , Pavan Kapanipathi , Luis A. Lastras

LLMs Meet Library Evolution: Evaluating Deprecated API Usage in LLM-based Code Completion

Large language models (LLMs), pre-trained or fine-tuned on large code corpora, have shown effectiveness in generating code completions. However, in LLM-based code completion, LLMs may struggle to use correct and up-to-date Application…

Software Engineering · Computer Science 2025-02-14 Chong Wang , Kaifeng Huang , Jian Zhang , Yebo Feng , Lyuye Zhang , Yang Liu , Xin Peng

A Framework for Testing and Adapting REST APIs as LLM Tools

Large Language Models (LLMs) are increasingly used to build autonomous agents that perform complex tasks with external tools, often exposed through APIs in enterprise systems. Direct use of these APIs is difficult due to the complex input…

Software Engineering · Computer Science 2025-09-15 Jayachandu Bandlamudi , Ritwik Chaudhuri , Neelamadhav Gantayat , Sambit Ghosh , Kushal Mukherjee , Prerna Agarwal , Renuka Sindhgatta , Sameep Mehta

On the Tool Manipulation Capability of Open-source Large Language Models

Recent studies on software tool manipulation with large language models (LLMs) mostly rely on closed model APIs. The industrial adoption of these models is substantially constrained due to the security and robustness risks in exposing…

Computation and Language · Computer Science 2023-05-29 Qiantong Xu , Fenglu Hong , Bo Li , Changran Hu , Zhengyu Chen , Jian Zhang

Enhancing Project-Specific Code Completion by Inferring Internal API Information

Project-specific code completion is a critical task that leverages context from a project to generate accurate code. State-of-the-art methods use retrieval-augmented generation (RAG) with large language models (LLMs) and project information…

Software Engineering · Computer Science 2025-07-29 Le Deng , Xiaoxue Ren , Chao Ni , Ming Liang , David Lo , Zhongxin Liu

Live API-Bench: 2500+ Live APIs for Testing Multi-Step Tool Calling

Large language models (LLMs) increasingly rely on external tools and APIs to execute complex tasks specified in natural language. Evaluating such tool calling capabilities in realistic enterprise settings is challenging: APIs are often…

Software Engineering · Computer Science 2026-01-27 Benjamin Elder , Anupama Murthi , Jungkoo Kang , Ankita Rajaram Naik , Kiran Kate , Kinjal Basu , Danish Contractor

WebUIBench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in WebUI-to-Code

With the rapid advancement of Generative AI technology, Multimodal Large Language Models(MLLMs) have the potential to act as AI software engineers capable of executing complex web application development. Considering that the model requires…

Computation and Language · Computer Science 2025-06-10 Zhiyu Lin , Zhengda Zhou , Zhiyuan Zhao , Tianrui Wan , Yilun Ma , Junyu Gao , Xuelong Li

Identifying and Mitigating API Misuse in Large Language Models

API misuse in code generated by large language models (LLMs) presents a serious and growing challenge in software development, as although LLMs demonstrate impressive code generation capabilities, their interactions with complex library…

Software Engineering · Computer Science 2025-12-19 Terry Yue Zhuo , Junda He , Jiamou Sun , Zhenchang Xing , David Lo , John Grundy , Xiaoning Du

AppBench: Planning of Multiple APIs from Various APPs for Complex User Instruction

Large Language Models (LLMs) can interact with the real world by connecting with versatile external APIs, resulting in better problem-solving and task automation capabilities. Previous research primarily focuses on APIs with limited…

Software Engineering · Computer Science 2024-10-29 Hongru Wang , Rui Wang , Boyang Xue , Heming Xia , Jingtao Cao , Zeming Liu , Jeff Z. Pan , Kam-Fai Wong

BaxBench: Can LLMs Generate Correct and Secure Backends?

Automatic program generation has long been a fundamental challenge in computer science. Recent benchmarks have shown that large language models (LLMs) can effectively generate code at the function level, make code edits, and solve…

Cryptography and Security · Computer Science 2025-06-02 Mark Vero , Niels Mündler , Victor Chibotaru , Veselin Raychev , Maximilian Baader , Nikola Jovanović , Jingxuan He , Martin Vechev

Let's Discover More API Relations: A Large Language Model-based AI Chain for Unsupervised API Relation Inference

APIs have intricate relations that can be described in text and represented as knowledge graphs to aid software engineering tasks. Existing relation extraction methods have limitations, such as limited API text corpus and affected by the…

Software Engineering · Computer Science 2023-11-03 Qing Huang , Yanbang Sun , Zhenchang Xing , Yuanlong Cao , Jieshan Chen , Xiwei Xu , Huan Jin , Jiaxing Lu

Harnessing LLMs for API Interactions: A Framework for Classification and Synthetic Data Generation

As Large Language Models (LLMs) advance in natural language processing, there is growing interest in leveraging their capabilities to simplify software interactions. In this paper, we propose a novel system that integrates LLMs for both…

Computation and Language · Computer Science 2024-09-19 Chunliang Tao , Xiaojing Fan , Yahe Yang

LLM-Generated Microservice Implementations from RESTful API Definitions

The growing need for scalable, maintainable, and fast-deploying systems has made microservice architecture widely popular in software development. This paper presents a system that uses Large Language Models (LLMs) to automate the API-first…

Software Engineering · Computer Science 2025-03-05 Saurabh Chauhan , Zeeshan Rasheed , Abdul Malik Sami , Zheying Zhang , Jussi Rasku , Kai-Kristian Kemell , Pekka Abrahamsson

WebCoderBench: Benchmarking Web Application Generation with Comprehensive and Interpretable Evaluation Metrics

Web applications (web apps) have become a key arena for large language models (LLMs) to demonstrate their code generation capabilities and commercial potential. However, building a benchmark for LLM-generated web apps remains challenging…

Software Engineering · Computer Science 2026-03-17 Chenxu Liu , Yingjie Fu , Wei Yang , Ying Zhang , Tao Xie

Benchmarking Large Language Models for ABAP Code Generation: An Empirical Study on Iterative Improvement by Compiler Feedback

This work investigates the performance of Large Language Models (LLMs) in generating ABAP code. Despite successful applications of generative AI in many programming languages, there are hardly any systematic analyses of ABAP code generation…

Software Engineering · Computer Science 2026-01-22 Stephan Wallraven , Tim Köhne , Hartmut Westenberger , Andreas Moser

RealBench: Benchmarking Verilog Generation Models with Real-World IP Designs

The automatic generation of Verilog code using Large Language Models (LLMs) has garnered significant interest in hardware design automation. However, existing benchmarks for evaluating LLMs in Verilog generation fall short in replicating…

Machine Learning · Computer Science 2025-07-23 Pengwei Jin , Di Huang , Chongxiao Li , Shuyao Cheng , Yang Zhao , Xinyao Zheng , Jiaguo Zhu , Shuyi Xing , Bohan Dou , Rui Zhang , Zidong Du , Qi Guo , Xing Hu