Related papers: Constant delay algorithms for regular document spa…

Constant-Delay Enumeration for Nondeterministic Document Spanners

We consider the information extraction framework known as document spanners, and study the problem of efficiently computing the results of the extraction from an input document, where the extraction task is described as a sequential…

Databases · Computer Science 2020-12-08 Antoine Amarilli , Pierre Bourhis , Stefan Mengel , Matthias Niewerth

Constant-Delay Enumeration for Nondeterministic Document Spanners

We consider the information extraction framework known as document spanners, and study the problem of efficiently computing the results of the extraction from an input document, where the extraction task is described as a sequential…

Databases · Computer Science 2023-09-06 Antoine Amarilli , Pierre Bourhis , Stefan Mengel , Matthias Niewerth

Constant-delay enumeration for SLP-compressed documents

We study the problem of enumerating results from a query over a compressed document. The model we use for compression are straight-line programs (SLPs), which are defined by a context-free grammar that produces a single string. For our…

Data Structures and Algorithms · Computer Science 2025-02-26 Martín Muñoz , Cristian Riveros

Grammars for Document Spanners

We propose a new grammar-based language for defining information-extractors from documents (text) that is built upon the well-studied framework of document spanners for extracting structured data from text. While previously studied…

Databases · Computer Science 2023-01-25 Liat Peterfreund

Spanner Evaluation over SLP-Compressed Documents

We consider the problem of evaluating regular spanners over compressed documents, i.e., we wish to solve evaluation tasks directly on the compressed data, without decompression. As compressed forms of the documents we use straight-line…

Data Structures and Algorithms · Computer Science 2021-01-27 Markus L. Schmid , Nicole Schweikardt

Recursive Programs for Document Spanners

A document spanner models a program for Information Extraction (IE) as a function that takes as input a text document (string over a finite alphabet) and produces a relation of spans (intervals in the document) over a predefined schema. A…

Databases · Computer Science 2018-05-24 Liat Peterfreund , Balder ten Cate , Ronald Fagin , Benny Kimelfeld

Automata-based constraints for language model decoding

Language models (LMs) are often expected to generate strings in some formal language; for example, structured data, API calls, or code snippets. Although LMs can be tuned to improve their adherence to formal syntax, this does not guarantee…

Computation and Language · Computer Science 2024-08-06 Terry Koo , Frederick Liu , Luheng He

Complexity Bounds for Relational Algebra over Document Spanners

We investigate the complexity of evaluating queries in Relational Algebra (RA) over the relations extracted by regex formulas (i.e., regular expressions with capture variables) over text documents. Such queries, also known as the regular…

Databases · Computer Science 2019-02-07 Liat Peterfreund , Dominik D. Freydenberger , Benny Kimelfeld , Markus Kröll

Split-Correctness in Information Extraction

Programs for extracting structured information from text, namely information extractors, often operate separately on document segments obtained from a generic splitting operation such as sentences, paragraphs, k-grams, HTTP requests, and so…

Databases · Computer Science 2021-05-21 Johannes Doleschal , Benny Kimelfeld , Wim Martens , Frank Neven , Matthias Niewerth

A framework for extraction and transformation of documents

We present a theoretical framework for the extraction and transformation of text documents. We propose to use a two-phase process where the first phase extracts span-tuples from a document, and the second phase maps the content of the…

Databases · Computer Science 2024-05-22 Cristian Riveros , Markus L. Schmid , Nicole Schweikardt

Learning Recurrent Span Representations for Extractive Question Answering

The reading comprehension task, that asks questions about a given evidence document, is a central problem in natural language understanding. Recent formulations of this task have typically focused on answer selection from a set of…

Computation and Language · Computer Science 2017-03-21 Kenton Lee , Shimi Salant , Tom Kwiatkowski , Ankur Parikh , Dipanjan Das , Jonathan Berant

Weight Annotation in Information Extraction

The framework of document spanners abstracts the task of information extraction from text as a function that maps every document (a string) into a relation over the document's spans (intervals identified by their start and end indices). For…

Databases · Computer Science 2023-06-22 Johannes Doleschal , Benny Kimelfeld , Wim Martens , Liat Peterfreund

The Complexity of Aggregates over Extractions by Regular Expressions

Regular expressions with capture variables, also known as regex-formulas, extract relations of spans (intervals identified by their start and end indices) from text. In turn, the class of regular document spanners is the closure of the…

Databases · Computer Science 2024-02-14 Johannes Doleschal , Benny Kimelfeld , Wim Martens

SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents

We present SummaRuNNer, a Recurrent Neural Network (RNN) based sequence model for extractive summarization of documents and show that it achieves performance better than or comparable to state-of-the-art. Our model has the additional…

Computation and Language · Computer Science 2016-11-15 Ramesh Nallapati , Feifei Zhai , Bowen Zhou

Neural Summarization by Extracting Sentences and Words

Traditional approaches to extractive summarization rely heavily on human-engineered features. In this work we propose a data-driven approach based on neural networks and continuous sentence features. We develop a general framework for…

Computation and Language · Computer Science 2016-07-04 Jianpeng Cheng , Mirella Lapata

Regular expressions for decoding of neural network outputs

This article proposes a convenient tool for decoding the output of neural networks trained by Connectionist Temporal Classification (CTC) for handwritten text recognition. We use regular expressions to describe the complex structures…

Neural and Evolutionary Computing · Computer Science 2016-03-31 Tobias Strauß , Gundram Leifert , Tobias Grüning , Roger Labahn

A Novel Framework to Expedite Systematic Reviews by Automatically Building Information Extraction Training Corpora

A systematic review identifies and collates various clinical studies and compares data elements and results in order to provide an evidence based answer for a particular clinical question. The process is manual and involves lot of time. A…

Information Retrieval · Computer Science 2016-06-22 Tanmay Basu , Shraman Kumar , Abhishek Kalyan , Priyanka Jayaswal , Pawan Goyal , Stephen Pettifer , Siddhartha R. Jonnalagadda

Neural Document Summarization by Jointly Learning to Score and Select Sentences

Sentence scoring and sentence selection are two main steps in extractive document summarization systems. However, previous works treat them as two separated subtasks. In this paper, we present a novel end-to-end neural network framework for…

Computation and Language · Computer Science 2018-07-09 Qingyu Zhou , Nan Yang , Furu Wei , Shaohan Huang , Ming Zhou , Tiejun Zhao

Refl-Spanners: A Purely Regular Approach to Non-Regular Core Spanners

The regular spanners (characterised by vset-automata) are closed under the algebraic operations of union, join and projection, and have desirable algorithmic properties. The core spanners (introduced by Fagin, Kimelfeld, Reiss, and…

Databases · Computer Science 2024-11-27 Markus L. Schmid , Nicole Schweikardt

Distraction-Based Neural Networks for Document Summarization

Distributed representation learned with neural networks has recently shown to be effective in modeling natural languages at fine granularities such as words, phrases, and even sentences. Whether and how such an approach can be extended to…

Computation and Language · Computer Science 2016-10-27 Qian Chen , Xiaodan Zhu , Zhenhua Ling , Si Wei , Hui Jiang