Related papers: Code Generation for Unknown Libraries via Reading …

ExploraCoder: Advancing code generation for multiple unseen APIs via planning and chained exploration

Large language models face intrinsic limitations in coding with APIs that are unseen in their training corpora. As libraries continuously evolve, it becomes impractical to exhaustively retrain LLMs with new API knowledge. This limitation…

Software Engineering · Computer Science 2025-06-23 Yunkun Wang , Yue Zhang , Zhen Qin , Chen Zhi , Binhua Li , Fei Huang , Yongbin Li , Shuiguang Deng

Evaluating In-Context Learning of Libraries for Code Generation

Contemporary Large Language Models (LLMs) exhibit a high degree of code generation and comprehension capability. A particularly promising area is their ability to interpret code modules from unfamiliar libraries for solving user-instructed…

Computation and Language · Computer Science 2024-04-08 Arkil Patel , Siva Reddy , Dzmitry Bahdanau , Pradeep Dasigi

When Language Model Meets Private Library

With the rapid development of pre-training techniques, a number of language models have been pre-trained on large-scale code corpora and perform well in code generation. In this paper, we investigate how to equip pre-trained language models…

Programming Languages · Computer Science 2022-11-01 Daoguang Zan , Bei Chen , Zeqi Lin , Bei Guan , Yongji Wang , Jian-Guang Lou

DocPrompting: Generating Code by Retrieving the Docs

Publicly available source-code libraries are continuously growing and changing. This makes it impossible for models of code to keep current with all available APIs by simply training these models on existing code repositories. Thus,…

Computation and Language · Computer Science 2023-02-21 Shuyan Zhou , Uri Alon , Frank F. Xu , Zhiruo Wang , Zhengbao Jiang , Graham Neubig

Private-Library-Oriented Code Generation with Large Language Models

Large language models (LLMs), such as Codex and GPT-4, have recently showcased their remarkable code generation abilities, facilitating a significant boost in coding efficiency. This paper will delve into utilizing LLMs for code generation…

Software Engineering · Computer Science 2023-07-31 Daoguang Zan , Bei Chen , Yongshun Gong , Junzhi Cao , Fengji Zhang , Bingchao Wu , Bei Guan , Yilong Yin , Yongji Wang

The Code2Text Challenge: Text Generation in Source Code Libraries

We propose a new shared task for tactical data-to-text generation in the domain of source code libraries. Specifically, we focus on text generation of function descriptions from example software projects. Data is drawn from existing…

Computation and Language · Computer Science 2018-07-12 Kyle Richardson , Sina Zarrieß , Jonas Kuhn

Deep Learning Based Code Generation Methods: Literature Review

This paper focuses on Code Generation task that aims at generating relevant code fragments according to given natural language descriptions. In the process of software development, developers often encounter two scenarios. One is requested…

Software Engineering · Computer Science 2024-04-19 Zezhou Yang , Sirong Chen , Cuiyun Gao , Zhenhao Li , Ge Li , Michael Lyu

CERT: Continual Pre-Training on Sketches for Library-Oriented Code Generation

Code generation is a longstanding challenge, aiming to generate a code snippet based on a natural language description. Usually, expensive text-code paired data is essential for training a code generation model. Recently, thanks to the…

Software Engineering · Computer Science 2022-06-15 Daoguang Zan , Bei Chen , Dejian Yang , Zeqi Lin , Minsu Kim , Bei Guan , Yongji Wang , Weizhu Chen , Jian-Guang Lou

Formal Fields: A Framework to Automate Code Generation Across Domains

Code generation, defined as automatically writing a piece of code to solve a given problem for which an evaluation function exists, is a classic hard AI problem. Its general form, writing code using a general language used by human…

Artificial Intelligence · Computer Science 2020-07-29 Jacques Basaldúa

A^3-CodGen: A Repository-Level Code Generation Framework for Code Reuse with Local-Aware, Global-Aware, and Third-Party-Library-Aware

LLM-based code generation tools are essential to help developers in the software development process. Existing tools often disconnect with the working context, i.e., the code repository, causing the generated code to be not similar to human…

Software Engineering · Computer Science 2024-10-29 Dianshu Liao , Shidong Pan , Xiaoyu Sun , Xiaoxue Ren , Qing Huang , Zhenchang Xing , Huan Jin , Qinying Li

Incorporating External Knowledge through Pre-training for Natural Language to Code Generation

Open-domain code generation aims to generate code in a general-purpose programming language (such as Python) from natural language (NL) intents. Motivated by the intuition that developers usually retrieve resources on the web when writing…

Computation and Language · Computer Science 2020-04-21 Frank F. Xu , Zhengbao Jiang , Pengcheng Yin , Bogdan Vasilescu , Graham Neubig

An Empirical Study of Retrieval-Augmented Code Generation: Challenges and Opportunities

Code generation aims to automatically generate code snippets of specific programming language according to natural language descriptions. The continuous advancements in deep learning, particularly pre-trained models, have empowered the code…

Software Engineering · Computer Science 2025-01-24 Zezhou Yang , Sirong Chen , Cuiyun Gao , Zhenhao Li , Xing Hu , Kui Liu , Xin Xia

NoviCode: Generating Programs from Natural Language Utterances by Novices

Current Text-to-Code models demonstrate impressive capabilities in generating executable code from natural language snippets. However, current studies focus on technical instructions and programmer-oriented language, and it is an open…

Computation and Language · Computer Science 2024-07-17 Asaf Achi Mordechai , Yoav Goldberg , Reut Tsarfaty

To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

Large Language Models (LLMs) have shown strong potential for code generation, yet they remain limited in private-library-oriented code generation, where the goal is to generate code using APIs from private libraries. Existing approaches…

Software Engineering · Computer Science 2026-03-30 Yitong Zhang , Chengze Li , Ruize Chen , Guowei Yang , Xiaoran Jia , Yijie Ren , Jia Li

Combining Contexts from Multiple Sources for Documentation-Specific Code Example Generation

Code example is a crucial part of good documentation. It helps the developers to understand the documentation easily and use the corresponding code unit (e.g., method) properly. However, many official documentation still lacks (good) code…

Software Engineering · Computer Science 2023-03-28 Junaed Younus Khan , Gias Uddin

ToolCoder: Teach Code Generation Models to use API search tools

Automatically generating source code from natural language descriptions has been a growing field of research in recent years. However, current large-scale code generation models often encounter difficulties when selecting appropriate APIs…

Software Engineering · Computer Science 2023-09-12 Kechi Zhang , Huangzhao Zhang , Ge Li , Jia Li , Zhuo Li , Zhi Jin

ReadMe.LLM: A Framework to Help LLMs Understand Your Library

Large Language Models (LLMs) often struggle with code generation tasks involving niche software libraries. Existing code generation techniques with only human-oriented documentation can fail -- even when the LLM has access to web search and…

Software Engineering · Computer Science 2025-05-09 Sandya Wijaya , Jacob Bolano , Alejandro Gomez Soteres , Shriyanshu Kode , Yue Huang , Anant Sahai

Adaptoring: Adapter Generation to Provide an Alternative API for a Library

Third-party libraries are a cornerstone of fast application development. To enable efficient use, libraries must provide a well-designed API. An obscure API instead slows down the learning process and can lead to erroneous use. The usual…

Software Engineering · Computer Science 2024-01-17 Lars Reimann , Günter Kniesel-Wünsche

Generation-Augmented Query Expansion For Code Retrieval

Pre-trained language models have achieved promising success in code retrieval tasks, where a natural language documentation query is given to find the most relevant existing code snippet. However, existing models focus only on optimizing…

Software Engineering · Computer Science 2022-12-22 Dong Li , Yelong Shen , Ruoming Jin , Yi Mao , Kuan Wang , Weizhu Chen

CodeDSI: Differentiable Code Search

Reimplementing solutions to previously solved software engineering problems is not only inefficient but also introduces inadequate and error-prone code. Many existing methods achieve impressive performance on this issue by using…

Software Engineering · Computer Science 2022-10-04 Usama Nadeem , Noah Ziems , Shaoen Wu