Related papers: DocGen: Generating Detailed Parameter Docstrings i…

CodeExp: Explanatory Code Document Generation

Developing models that can automatically generate detailed code explanation can greatly benefit software maintenance and programming education. However, existing code-to-text generation models often produce only high-level summaries of code…

Computation and Language · Computer Science 2022-11-29 Haotian Cui , Chenglong Wang , Junjie Huang , Jeevana Priya Inala , Todd Mytkowicz , Bo Wang , Jianfeng Gao , Nan Duan

A parallel corpus of Python functions and documentation strings for automated code documentation and code generation

Automated documentation of programming source code and automated code generation from natural language are challenging tasks of both practical and scientific interest. Progress in these areas has been limited by the low availability of…

Computation and Language · Computer Science 2017-07-10 Antonio Valerio Miceli Barone , Rico Sennrich

DocLens: Multi-aspect Fine-grained Evaluation for Medical Text Generation

Medical text generation aims to assist with administrative work and highlight salient information to support decision-making. To reflect the specific requirements of medical text, in this paper, we propose a set of metrics to evaluate the…

Computation and Language · Computer Science 2024-10-04 Yiqing Xie , Sheng Zhang , Hao Cheng , Pengfei Liu , Zelalem Gero , Cliff Wong , Tristan Naumann , Hoifung Poon , Carolyn Rose

DocuMint: Docstring Generation for Python using Small Language Models

Effective communication, specifically through documentation, is the beating heart of collaboration among contributors in software development. Recent advancements in language models (LMs) have enabled the introduction of a new type of actor…

Software Engineering · Computer Science 2024-05-17 Bibek Poudel , Adam Cook , Sekou Traore , Shelah Ameli

How Can We Synthesize High-Quality Pretraining Data? A Systematic Study of Prompt Design, Generator Model, and Source Data

Synthetic data is a standard component in training large language models, yet systematic comparisons across design dimensions, including rephrasing strategy, generator model, and source data, remain absent. We conduct extensive controlled…

Computation and Language · Computer Science 2026-04-16 Joel Niklaus , Atsuki Yamaguchi , Michal Štefánik , Guilherme Penedo , Hynek Kydlíček , Elie Bakouch , Lewis Tunstall , Edward Emanuel Beeching , Thibaud Frere , Colin Raffel , Leandro von Werra , Thomas Wolf

Challenges in Data-to-Document Generation

Recent neural models have shown significant progress on the problem of generating short descriptive texts conditioned on a small number of database records. In this work, we suggest a slightly more difficult data-to-text generation task,…

Computation and Language · Computer Science 2017-07-26 Sam Wiseman , Stuart M. Shieber , Alexander M. Rush

Auto-Documenation for Software Development

Software documentation is an essential but labor intensive task that often requires a dedicated team of developers to ensure coverage and accuracy. Good documentation will help shorten the development cycle and improve the overall team…

Software Engineering · Computer Science 2017-01-31 Thomas Zheng , Jeff Shaw , Sergey Kozlov

DocFetch - Towards Generating Software Documentation from Multiple Software Artifacts

Software Documentation plays a major role in the usage and development of a project. Widespread adoption of open source software projects contributes to larger and faster development of the projects, making it difficult to maintain the…

Software Engineering · Computer Science 2025-08-26 Akhila Sri Manasa Venigalla , Sridhar Chimalakonda

DocFusion: A Unified Framework for Document Parsing Tasks

Document parsing is essential for analyzing complex document structures and extracting fine-grained information, supporting numerous downstream applications. However, existing methods often require integrating multiple independent models to…

Computation and Language · Computer Science 2025-05-23 Mingxu Chai , Ziyu Shen , Chong Zhang , Yue Zhang , Xiao Wang , Shihan Dou , Jihua Kang , Jiazheng Zhang , Qi Zhang

DocPrompting: Generating Code by Retrieving the Docs

Publicly available source-code libraries are continuously growing and changing. This makes it impossible for models of code to keep current with all available APIs by simply training these models on existing code repositories. Thus,…

Computation and Language · Computer Science 2023-02-21 Shuyan Zhou , Uri Alon , Frank F. Xu , Zhiruo Wang , Zhengbao Jiang , Graham Neubig

DocCGen: Document-based Controlled Code Generation

Recent developments show that Large Language Models (LLMs) produce state-of-the-art performance on natural language (NL) to code generation for resource-rich general-purpose languages like C++, Java, and Python. However, their practical…

Software Engineering · Computer Science 2024-07-04 Sameer Pimparkhede , Mehant Kammakomati , Srikanth Tamilselvam , Prince Kumar , Ashok Pon Kumar , Pushpak Bhattacharyya

CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis

Program synthesis strives to generate a computer program as a solution to a given problem specification, expressed with input-output examples or natural language descriptions. The prevalence of large language models advances the…

Machine Learning · Computer Science 2023-03-01 Erik Nijkamp , Bo Pang , Hiroaki Hayashi , Lifu Tu , Huan Wang , Yingbo Zhou , Silvio Savarese , Caiming Xiong

An Analysis of Datasets, Metrics and Models in Keyphrase Generation

Keyphrase generation refers to the task of producing a set of words or phrases that summarises the content of a document. Continuous efforts have been dedicated to this task over the past few years, spreading across multiple lines of…

Information Retrieval · Computer Science 2025-06-13 Florian Boudin , Akiko Aizawa

Function-constrained Program Synthesis

This work introduces (1) a technique that allows large language models (LLMs) to leverage user-provided code when solving programming tasks and (2) a method to iteratively generate modular sub-functions that can aid future code generation…

Machine Learning · Computer Science 2023-12-05 Patrick Hajali , Ignas Budvytis

On Generating Extended Summaries of Long Documents

Prior work in document summarization has mainly focused on generating short summaries of a document. While this type of summary helps get a high-level view of a given document, it is desirable in some cases to know more detailed information…

Computation and Language · Computer Science 2020-12-29 Sajad Sotudeh , Arman Cohan , Nazli Goharian

Mining Documentation to Extract Hyperparameter Schemas

AI automation tools need machine-readable hyperparameter schemas to define their search spaces. At the same time, AI libraries often come with good human-readable documentation. While such documentation contains most of the necessary…

Machine Learning · Computer Science 2020-07-06 Guillaume Baudart , Peter D. Kirchner , Martin Hirzel , Kiran Kate

RepoSummary: Feature-Oriented Summarization and Documentation Generation for Code Repositories

Repository summarization is a crucial research question in development and maintenance for software engineering. Existing repository summarization techniques primarily focus on summarizing code according to the directory tree, which is…

Software Engineering · Computer Science 2025-10-14 Yifeng Zhu , Xianlin Zhao , Xutian Li , Yanzhen Zou , Haizhuo Yuan , Yue Wang , Bing Xie

Can Developers Prompt? A Controlled Experiment for Code Documentation Generation

Large language models (LLMs) bear great potential for automating tedious development tasks such as creating and maintaining code documentation. However, it is unclear to what extent developers can effectively prompt LLMs to create concise…

Artificial Intelligence · Computer Science 2025-07-09 Hans-Alexander Kruse , Tim Puhlfürß , Walid Maalej

Input-Gen: Guided Generation of Stateful Inputs for Testing, Tuning, and Training

The size and complexity of software applications is increasing at an accelerating pace. Source code repositories (along with their dependencies) require vast amounts of labor to keep them tested, maintained, and up to date. As the…

Software Engineering · Computer Science 2024-06-14 Ivan R. Ivanov , Joachim Meyer , Aiden Grossman , William S. Moses , Johannes Doerfert

Generating a Structured Summary of Numerous Academic Papers: Dataset and Method

Writing a survey paper on one research topic usually needs to cover the salient content from numerous related papers, which can be modeled as a multi-document summarization (MDS) task. Existing MDS datasets usually focus on producing the…

Computation and Language · Computer Science 2023-02-10 Shuaiqi Liu , Jiannong Cao , Ruosong Yang , Zhiyuan Wen