Related papers: Mining Documentation to Extract Hyperparameter Sch…

DocGen: Generating Detailed Parameter Docstrings in Python

Documentation debt hinders the effective utilization of open-source software. Although code summarization tools have been helpful for developers, most would prefer a detailed account of each parameter in a function rather than a high-level…

Software Engineering · Computer Science 2023-11-21 Vatsal Venkatkrishna , Durga Shree Nagabushanam , Emmanuel Iko-Ojo Simon , Melina Vidoni

GrASP: A Library for Extracting and Exploring Human-Interpretable Textual Patterns

Data exploration is an important step of every data science and machine learning project, including those involving textual data. We provide a novel language tool, in the form of a publicly available Python library for extracting patterns…

Computation and Language · Computer Science 2022-06-20 Piyawat Lertvittayakumjorn , Leshem Choshen , Eyal Shnarch , Francesca Toni

Predictive Synthesis of API-Centric Code

Today's programmers, especially data science practitioners, make heavy use of data-processing libraries (APIs) such as PyTorch, Tensorflow, NumPy, Pandas, and the like. Program synthesizers can provide significant coding assistance to this…

Software Engineering · Computer Science 2022-05-19 Daye Nam , Baishakhi Ray , Seohyun Kim , Xianshan Qu , Satish Chandra

Automatic Analysis of Available Source Code of Top Artificial Intelligence Conference Papers

Source code is essential for researchers to reproduce the methods and replicate the results of artificial intelligence (AI) papers. Some organizations and researchers manually collect AI papers with available source code to contribute to…

Software Engineering · Computer Science 2022-09-29 Jialiang Lin , Yingmin Wang , Yao Yu , Yu Zhou , Yidong Chen , Xiaodong Shi

DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond

In this report, we introduce DocXChain, a powerful open-source toolchain for document parsing, which is designed and developed to automatically convert the rich information embodied in unstructured documents, such as text, tables and…

Computer Vision and Pattern Recognition · Computer Science 2023-10-20 Cong Yao

A parallel corpus of Python functions and documentation strings for automated code documentation and code generation

Automated documentation of programming source code and automated code generation from natural language are challenging tasks of both practical and scientific interest. Progress in these areas has been limited by the low availability of…

Computation and Language · Computer Science 2017-07-10 Antonio Valerio Miceli Barone , Rico Sennrich

Schema-Driven Information Extraction from Heterogeneous Tables

In this paper, we explore the question of whether large language models can support cost-efficient information extraction from tables. We introduce schema-driven information extraction, a new task that transforms tabular data into…

Computation and Language · Computer Science 2024-11-22 Fan Bai , Junmo Kang , Gabriel Stanovsky , Dayne Freitag , Mark Dredze , Alan Ritter

Automatic extraction of requirements expressed in industrial standards : a way towards machine readable standards ?

The project, under industrial funding, presented in this publication aims at the semantic analysis of a normative document describing requirements applicable to electrical appliances. The objective of the project is to build a semantic…

Information Retrieval · Computer Science 2021-12-28 Helene de Ribaupierre , Anne-Francoise Cutting-Decelle , Nathalie Baumier , Serge Blumental

DeepShovel: An Online Collaborative Platform for Data Extraction in Geoscience Literature with AI Assistance

Geoscientists, as well as researchers in many fields, need to read a huge amount of literature to locate, extract, and aggregate relevant results and data to enable future research or to build a scientific database, but there is no existing…

Human-Computer Interaction · Computer Science 2022-02-25 Shao Zhang , Yuting Jia , Hui Xu , Ying Wen , Dakuo Wang , Xinbing Wang

A tool set for the quick and efficient exploration of large document collections

We are presenting a set of multilingual text analysis tools that can help analysts in any field to explore large document collections quickly in order to determine whether the documents contain information of interest, and to find the…

Computation and Language · Computer Science 2007-05-23 Camelia Ignat , Bruno Pouliquen , Ralf Steinberger , Tomaz Erjavec

Unsupervised Data Extraction from Computer-generated Documents with Single Line Formatting

Processing large amounts of data is an essential problem of the big data era. Most of the data exchange is done via direct communication (using APIs) and well-structured file formats (JSON, XML, EDI, etc.), but a significant portion of the…

Information Retrieval · Computer Science 2020-07-17 Vladimir Bernstein , Andrei Afanassenkov

A Joint Learning Approach based on Self-Distillation for Keyphrase Extraction from Scientific Documents

Keyphrase extraction is the task of extracting a small set of phrases that best describe a document. Most existing benchmark datasets for the task typically have limited numbers of annotated documents, making it challenging to train…

Computation and Language · Computer Science 2020-10-26 Tuan Manh Lai , Trung Bui , Doo Soon Kim , Quan Hung Tran

AxCell: Automatic Extraction of Results from Machine Learning Papers

Tracking progress in machine learning has become increasingly difficult with the recent explosion in the number of papers. In this paper, we present AxCell, an automatic machine learning pipeline for extracting results from papers. AxCell…

Computation and Language · Computer Science 2020-04-30 Marcin Kardas , Piotr Czapla , Pontus Stenetorp , Sebastian Ruder , Sebastian Riedel , Ross Taylor , Robert Stojnic

LLMs4SchemaDiscovery: A Human-in-the-Loop Workflow for Scientific Schema Mining with Large Language Models

Extracting structured information from unstructured text is crucial for modeling real-world processes, but traditional schema mining relies on semi-structured data, limiting scalability. This paper introduces schema-miner, a novel tool that…

Computation and Language · Computer Science 2025-04-02 Sameer Sadruddin , Jennifer D'Souza , Eleni Poupaki , Alex Watkins , Hamed Babaei Giglou , Anisa Rula , Bora Karasulu , Sören Auer , Adrie Mackus , Erwin Kessels

Automatically Extracting Subroutine Summary Descriptions from Unstructured Comments

Summary descriptions of subroutines are short (usually one-sentence) natural language explanations of a subroutine's behavior and purpose in a program. These summaries are ubiquitous in documentation, and many tools such as JavaDocs and…

Software Engineering · Computer Science 2019-12-24 Zachary Eberhart , Alexander LeClair , Collin McMillan

The autofeat Python Library for Automated Feature Engineering and Selection

This paper describes the autofeat Python library, which provides scikit-learn style linear regression and classification models with automated feature engineering and selection capabilities. Complex non-linear machine learning models, such…

Machine Learning · Computer Science 2020-02-27 Franziska Horn , Robert Pack , Michael Rieger

BOAH: A Tool Suite for Multi-Fidelity Bayesian Optimization & Analysis of Hyperparameters

Hyperparameter optimization and neural architecture search can become prohibitively expensive for regular black-box Bayesian optimization because the training and evaluation of a single model can easily take several hours. To overcome this,…

Machine Learning · Computer Science 2019-08-20 Marius Lindauer , Katharina Eggensperger , Matthias Feurer , André Biedenkapp , Joshua Marben , Philipp Müller , Frank Hutter

AlphaClean: Automatic Generation of Data Cleaning Pipelines

The analyst effort in data cleaning is gradually shifting away from the design of hand-written scripts to building and tuning complex pipelines of automated data cleaning libraries. Hyper-parameter tuning for data cleaning is very different…

Databases · Computer Science 2019-05-08 Sanjay Krishnan , Eugene Wu

Scaling Systematic Literature Reviews with Machine Learning Pipelines

Systematic reviews, which entail the extraction of data from large numbers of scientific documents, are an ideal avenue for the application of machine learning. They are vital to many fields of science and philanthropy, but are very…

Computation and Language · Computer Science 2020-10-12 Seraphina Goldfarb-Tarrant , Alexander Robertson , Jasmina Lazic , Theodora Tsouloufi , Louise Donnison , Karen Smyth

Large Language Models for JSON Schema Discovery

Semi-structured data formats such as JSON have proved to be useful data models for applications that require flexibility in the format of data stored. However, JSON data often come without the schemas that are typically available with…

Databases · Computer Science 2024-07-04 Michael J. Mior