English
Related papers

Related papers: Mining Documentation to Extract Hyperparameter Sch…

200 papers

Documentation debt hinders the effective utilization of open-source software. Although code summarization tools have been helpful for developers, most would prefer a detailed account of each parameter in a function rather than a high-level…

Software Engineering · Computer Science 2023-11-21 Vatsal Venkatkrishna , Durga Shree Nagabushanam , Emmanuel Iko-Ojo Simon , Melina Vidoni

Data exploration is an important step of every data science and machine learning project, including those involving textual data. We provide a novel language tool, in the form of a publicly available Python library for extracting patterns…

Computation and Language · Computer Science 2022-06-20 Piyawat Lertvittayakumjorn , Leshem Choshen , Eyal Shnarch , Francesca Toni

Today's programmers, especially data science practitioners, make heavy use of data-processing libraries (APIs) such as PyTorch, Tensorflow, NumPy, Pandas, and the like. Program synthesizers can provide significant coding assistance to this…

Software Engineering · Computer Science 2022-05-19 Daye Nam , Baishakhi Ray , Seohyun Kim , Xianshan Qu , Satish Chandra

Source code is essential for researchers to reproduce the methods and replicate the results of artificial intelligence (AI) papers. Some organizations and researchers manually collect AI papers with available source code to contribute to…

Software Engineering · Computer Science 2022-09-29 Jialiang Lin , Yingmin Wang , Yao Yu , Yu Zhou , Yidong Chen , Xiaodong Shi

In this report, we introduce DocXChain, a powerful open-source toolchain for document parsing, which is designed and developed to automatically convert the rich information embodied in unstructured documents, such as text, tables and…

Computer Vision and Pattern Recognition · Computer Science 2023-10-20 Cong Yao

Automated documentation of programming source code and automated code generation from natural language are challenging tasks of both practical and scientific interest. Progress in these areas has been limited by the low availability of…

Computation and Language · Computer Science 2017-07-10 Antonio Valerio Miceli Barone , Rico Sennrich

In this paper, we explore the question of whether large language models can support cost-efficient information extraction from tables. We introduce schema-driven information extraction, a new task that transforms tabular data into…

Computation and Language · Computer Science 2024-11-22 Fan Bai , Junmo Kang , Gabriel Stanovsky , Dayne Freitag , Mark Dredze , Alan Ritter

The project, under industrial funding, presented in this publication aims at the semantic analysis of a normative document describing requirements applicable to electrical appliances. The objective of the project is to build a semantic…

Information Retrieval · Computer Science 2021-12-28 Helene de Ribaupierre , Anne-Francoise Cutting-Decelle , Nathalie Baumier , Serge Blumental

Geoscientists, as well as researchers in many fields, need to read a huge amount of literature to locate, extract, and aggregate relevant results and data to enable future research or to build a scientific database, but there is no existing…

Human-Computer Interaction · Computer Science 2022-02-25 Shao Zhang , Yuting Jia , Hui Xu , Ying Wen , Dakuo Wang , Xinbing Wang

We are presenting a set of multilingual text analysis tools that can help analysts in any field to explore large document collections quickly in order to determine whether the documents contain information of interest, and to find the…

Computation and Language · Computer Science 2007-05-23 Camelia Ignat , Bruno Pouliquen , Ralf Steinberger , Tomaz Erjavec

Processing large amounts of data is an essential problem of the big data era. Most of the data exchange is done via direct communication (using APIs) and well-structured file formats (JSON, XML, EDI, etc.), but a significant portion of the…

Information Retrieval · Computer Science 2020-07-17 Vladimir Bernstein , Andrei Afanassenkov

Keyphrase extraction is the task of extracting a small set of phrases that best describe a document. Most existing benchmark datasets for the task typically have limited numbers of annotated documents, making it challenging to train…

Computation and Language · Computer Science 2020-10-26 Tuan Manh Lai , Trung Bui , Doo Soon Kim , Quan Hung Tran

Tracking progress in machine learning has become increasingly difficult with the recent explosion in the number of papers. In this paper, we present AxCell, an automatic machine learning pipeline for extracting results from papers. AxCell…

Computation and Language · Computer Science 2020-04-30 Marcin Kardas , Piotr Czapla , Pontus Stenetorp , Sebastian Ruder , Sebastian Riedel , Ross Taylor , Robert Stojnic

Extracting structured information from unstructured text is crucial for modeling real-world processes, but traditional schema mining relies on semi-structured data, limiting scalability. This paper introduces schema-miner, a novel tool that…

Summary descriptions of subroutines are short (usually one-sentence) natural language explanations of a subroutine's behavior and purpose in a program. These summaries are ubiquitous in documentation, and many tools such as JavaDocs and…

Software Engineering · Computer Science 2019-12-24 Zachary Eberhart , Alexander LeClair , Collin McMillan

This paper describes the autofeat Python library, which provides scikit-learn style linear regression and classification models with automated feature engineering and selection capabilities. Complex non-linear machine learning models, such…

Machine Learning · Computer Science 2020-02-27 Franziska Horn , Robert Pack , Michael Rieger

Hyperparameter optimization and neural architecture search can become prohibitively expensive for regular black-box Bayesian optimization because the training and evaluation of a single model can easily take several hours. To overcome this,…

The analyst effort in data cleaning is gradually shifting away from the design of hand-written scripts to building and tuning complex pipelines of automated data cleaning libraries. Hyper-parameter tuning for data cleaning is very different…

Databases · Computer Science 2019-05-08 Sanjay Krishnan , Eugene Wu

Systematic reviews, which entail the extraction of data from large numbers of scientific documents, are an ideal avenue for the application of machine learning. They are vital to many fields of science and philanthropy, but are very…

Computation and Language · Computer Science 2020-10-12 Seraphina Goldfarb-Tarrant , Alexander Robertson , Jasmina Lazic , Theodora Tsouloufi , Louise Donnison , Karen Smyth

Semi-structured data formats such as JSON have proved to be useful data models for applications that require flexibility in the format of data stored. However, JSON data often come without the schemas that are typically available with…

Databases · Computer Science 2024-07-04 Michael J. Mior
‹ Prev 1 2 3 10 Next ›