Related papers: Recursive Programs for Document Spanners

SpannerLib: Embedding Declarative Information Extraction in an Imperative Workflow

Document spanners have been proposed as a formal framework for declarative Information Extraction (IE) from text, following IE products from the industry and academia. Over the past decade, the framework has been studied thoroughly in terms…

Databases · Computer Science 2024-09-05 Dean Light , Ahmad Aiashy , Mahmoud Diab , Daniel Nachmias , Stijn Vansummeren , Benny Kimelfeld

Grammars for Document Spanners

We propose a new grammar-based language for defining information-extractors from documents (text) that is built upon the well-studied framework of document spanners for extracting structured data from text. While previously studied…

Databases · Computer Science 2023-01-25 Liat Peterfreund

The Complexity of Aggregates over Extractions by Regular Expressions

Regular expressions with capture variables, also known as regex-formulas, extract relations of spans (intervals identified by their start and end indices) from text. In turn, the class of regular document spanners is the closure of the…

Databases · Computer Science 2024-02-14 Johannes Doleschal , Benny Kimelfeld , Wim Martens

A framework for extraction and transformation of documents

We present a theoretical framework for the extraction and transformation of text documents. We propose to use a two-phase process where the first phase extracts span-tuples from a document, and the second phase maps the content of the…

Databases · Computer Science 2024-05-22 Cristian Riveros , Markus L. Schmid , Nicole Schweikardt

Constant delay algorithms for regular document spanners

Regular expressions and automata models with capture variables are core tools in rule-based information extraction. These formalisms, also called regular document spanners, use regular languages in order to locate the data that a user wants…

Databases · Computer Science 2018-03-15 Fernando Florenzano , Cristian Riveros , Martin Ugarte , Stijn Vansummeren , Domagoj Vrgoc

Complexity Bounds for Relational Algebra over Document Spanners

We investigate the complexity of evaluating queries in Relational Algebra (RA) over the relations extracted by regex formulas (i.e., regular expressions with capture variables) over text documents. Such queries, also known as the regular…

Databases · Computer Science 2019-02-07 Liat Peterfreund , Dominik D. Freydenberger , Benny Kimelfeld , Markus Kröll

Document Spanners for Extracting Incomplete Information: Expressiveness and Complexity

Rule-based information extraction has lately received a fair amount of attention from the database community, with several languages appearing in the last few years. Although information extraction systems are intended to deal with…

Databases · Computer Science 2018-01-01 Francisco Maturana , Cristian Riveros , Domagoj Vrgoč

A Span Extraction Approach for Information Extraction on Visually-Rich Documents

Information extraction (IE) for visually-rich documents (VRDs) has achieved SOTA performance recently thanks to the adaptation of Transformer-based language models, which shows the great potential of pre-training methods. In this paper, we…

Artificial Intelligence · Computer Science 2021-07-07 Tuan-Anh D. Nguyen , Hieu M. Vu , Nguyen Hong Son , Minh-Tien Nguyen

Joining Extractions of Regular Expressions

Regular expressions with capture variables, also known as "regex formulas," extract relations of spans (interval positions) from text. These relations can be further manipulated via Relational Algebra as studied in the context of document…

Databases · Computer Science 2017-03-31 Dominik D. Freydenberger , Benny Kimelfeld , Liat Peterfreund

Learning Recurrent Span Representations for Extractive Question Answering

The reading comprehension task, that asks questions about a given evidence document, is a central problem in natural language understanding. Recent formulations of this task have typically focused on answer selection from a set of…

Computation and Language · Computer Science 2017-03-21 Kenton Lee , Shimi Salant , Tom Kwiatkowski , Ankur Parikh , Dipanjan Das , Jonathan Berant

Exact Recursive Probabilistic Programming

Recursive calls over recursive data are useful for generating probability distributions, and probabilistic programming allows computations over these distributions to be expressed in a modular and intuitive way. Exact inference is also…

Programming Languages · Computer Science 2023-03-28 David Chiang , Colin McDonald , Chung-chieh Shan

From Regexes to Parsing Expression Grammars

Most scripting languages nowadays use regex pattern-matching libraries. These regex libraries borrow the syntax of regular expressions, but have an informal semantics that is different from the semantics of regular expressions, removing the…

Formal Languages and Automata Theory · Computer Science 2014-02-17 Sérgio Medeiros , Fabio Mascarenhas , Roberto Ierusalimschy

Spanner Evaluation over SLP-Compressed Documents

We consider the problem of evaluating regular spanners over compressed documents, i.e., we wish to solve evaluation tasks directly on the compressed data, without decompression. As compressed forms of the documents we use straight-line…

Data Structures and Algorithms · Computer Science 2021-01-27 Markus L. Schmid , Nicole Schweikardt

Translating Recursive Probabilistic Programs to Factor Graph Grammars

It is natural for probabilistic programs to use conditionals to express alternative substructures in models, and loops (recursion) to express repeated substructures in models. Thus, probabilistic programs with conditionals and recursion…

Programming Languages · Computer Science 2020-10-26 David Chiang , Chung-chieh Shan

FC-Datalog as a Framework for Efficient String Querying

Core spanners are a class of document spanners that capture the core functionality of IBM's AQL. FC is a logic on strings built around word equations that when extended with constraints for regular languages can be seen as a logic for core…

Logic in Computer Science · Computer Science 2025-01-20 Owen M. Bell , Joel D. Day , Dominik D. Freydenberger

Probing Representations for Document-level Event Extraction

The probing classifiers framework has been employed for interpreting deep neural network models for a variety of natural language processing (NLP) applications. Studies, however, have largely focused on sentencelevel NLP tasks. This work is…

Computation and Language · Computer Science 2023-10-25 Barry Wang , Xinya Du , Claire Cardie

SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents

We present SummaRuNNer, a Recurrent Neural Network (RNN) based sequence model for extractive summarization of documents and show that it achieves performance better than or comparable to state-of-the-art. Our model has the additional…

Computation and Language · Computer Science 2016-11-15 Ramesh Nallapati , Feifei Zhai , Bowen Zhou

Introduction to Searching with Regular Expressions

The explosive rate of information growth and availability often makes it increasingly difficult to locate information pertinent to your needs. These problems are often compounded when keyword based search methodologies are not adequate for…

Information Retrieval · Computer Science 2008-10-10 Christopher M. Frenz

Split-Correctness in Information Extraction

Programs for extracting structured information from text, namely information extractors, often operate separately on document segments obtained from a generic splitting operation such as sentences, paragraphs, k-grams, HTTP requests, and so…

Databases · Computer Science 2021-05-21 Johannes Doleschal , Benny Kimelfeld , Wim Martens , Frank Neven , Matthias Niewerth

Towards Better Document-level Relation Extraction via Iterative Inference

Document-level relation extraction (RE) aims to extract the relations between entities from the input document that usually containing many difficultly-predicted entity pairs whose relations can only be predicted through relational…

Computation and Language · Computer Science 2022-11-29 Liang Zhang , Jinsong Su , Yidong Chen , Zhongjian Miao , Zijun Min , Qingguo Hu , Xiaodong Shi