Related papers: Weight Annotation in Information Extraction

Grammars for Document Spanners

We propose a new grammar-based language for defining information-extractors from documents (text) that is built upon the well-studied framework of document spanners for extracting structured data from text. While previously studied…

Databases · Computer Science 2023-01-25 Liat Peterfreund

Constant delay algorithms for regular document spanners

Regular expressions and automata models with capture variables are core tools in rule-based information extraction. These formalisms, also called regular document spanners, use regular languages in order to locate the data that a user wants…

Databases · Computer Science 2018-03-15 Fernando Florenzano , Cristian Riveros , Martin Ugarte , Stijn Vansummeren , Domagoj Vrgoc

A Span Extraction Approach for Information Extraction on Visually-Rich Documents

Information extraction (IE) for visually-rich documents (VRDs) has achieved SOTA performance recently thanks to the adaptation of Transformer-based language models, which shows the great potential of pre-training methods. In this paper, we…

Artificial Intelligence · Computer Science 2021-07-07 Tuan-Anh D. Nguyen , Hieu M. Vu , Nguyen Hong Son , Minh-Tien Nguyen

Complexity Bounds for Relational Algebra over Document Spanners

We investigate the complexity of evaluating queries in Relational Algebra (RA) over the relations extracted by regex formulas (i.e., regular expressions with capture variables) over text documents. Such queries, also known as the regular…

Databases · Computer Science 2019-02-07 Liat Peterfreund , Dominik D. Freydenberger , Benny Kimelfeld , Markus Kröll

The Complexity of Aggregates over Extractions by Regular Expressions

Regular expressions with capture variables, also known as regex-formulas, extract relations of spans (intervals identified by their start and end indices) from text. In turn, the class of regular document spanners is the closure of the…

Databases · Computer Science 2024-02-14 Johannes Doleschal , Benny Kimelfeld , Wim Martens

Constant-Delay Enumeration for Nondeterministic Document Spanners

We consider the information extraction framework known as document spanners, and study the problem of efficiently computing the results of the extraction from an input document, where the extraction task is described as a sequential…

Databases · Computer Science 2020-12-08 Antoine Amarilli , Pierre Bourhis , Stefan Mengel , Matthias Niewerth

AMR Parsing with Action-Pointer Transformer

Abstract Meaning Representation parsing is a sentence-to-graph prediction task where target nodes are not explicitly aligned to sentence tokens. However, since graph nodes are semantically based on one or more sentence tokens, implicit…

Computation and Language · Computer Science 2021-05-19 Jiawei Zhou , Tahira Naseem , Ramón Fernandez Astudillo , Radu Florian

Improving Multi-Document Summarization through Referenced Flexible Extraction with Credit-Awareness

A notable challenge in Multi-Document Summarization (MDS) is the extremely-long length of the input. In this paper, we present an extract-then-abstract Transformer framework to overcome the problem. Specifically, we leverage pre-trained…

Computation and Language · Computer Science 2022-05-05 Yun-Zhu Song , Yi-Syuan Chen , Hong-Han Shuai

Constant-Delay Enumeration for Nondeterministic Document Spanners

We consider the information extraction framework known as document spanners, and study the problem of efficiently computing the results of the extraction from an input document, where the extraction task is described as a sequential…

Databases · Computer Science 2023-09-06 Antoine Amarilli , Pierre Bourhis , Stefan Mengel , Matthias Niewerth

Weighted Rewriting: Semiring Semantics for Abstract Reduction Systems

We present novel semiring semantics for abstract reduction systems (ARSs). More precisely, we provide a weighted version of ARSs, where the reduction steps induce weights from a semiring. Inspired by provenance analysis in database theory…

Logic in Computer Science · Computer Science 2025-05-14 Emma Ahrens , Jan-Christoph Kassing , Jürgen Giesl , Joost-Pieter Katoen

A General Information Extraction Framework Based on Formal Languages

For a terminal alphabet $\Sigma$ and an attribute alphabet $\Gamma$, a $(\Sigma, \Gamma)$-extractor is a function that maps every string over $\Sigma$ to a table with a column per attribute and with sets of positions of $w$ as cell entries.…

Formal Languages and Automata Theory · Computer Science 2026-03-18 Markus L. Schmid

Efficient Argument Structure Extraction with Transfer Learning and Active Learning

The automation of extracting argument structures faces a pair of challenges on (1) encoding long-term contexts to facilitate comprehensive understanding, and (2) improving data efficiency since constructing high-quality argument structures…

Computation and Language · Computer Science 2022-04-05 Xinyu Hua , Lu Wang

Enhanced Language Representation with Label Knowledge for Span Extraction

Span extraction, aiming to extract text spans (such as words or phrases) from plain texts, is a fundamental process in Information Extraction. Recent works introduce the label knowledge to enhance the text representation by formalizing the…

Computation and Language · Computer Science 2021-11-02 Pan Yang , Xin Cong , Zhenyun Sun , Xingwu Liu

Attention as Binding: A Vector-Symbolic Perspective on Transformer Reasoning

Transformer-based language models display impressive reasoning-like behavior, yet remain brittle on tasks that require stable symbolic manipulation. This paper develops a unified perspective on these phenomena by interpreting self-attention…

Artificial Intelligence · Computer Science 2025-12-18 Sahil Rajesh Dhayalkar

Split-Correctness in Information Extraction

Programs for extracting structured information from text, namely information extractors, often operate separately on document segments obtained from a generic splitting operation such as sentences, paragraphs, k-grams, HTTP requests, and so…

Databases · Computer Science 2021-05-21 Johannes Doleschal , Benny Kimelfeld , Wim Martens , Frank Neven , Matthias Niewerth

Keyphrase Annotation with Graph Co-Ranking

Keyphrase annotation is the task of identifying textual units that represent the main content of a document. Keyphrase annotation is either carried out by extracting the most important phrases from a document, keyphrase extraction, or by…

Computation and Language · Computer Science 2016-11-08 Adrien Bougouin , Florian Boudin , Béatrice Daille

Learning Recurrent Span Representations for Extractive Question Answering

The reading comprehension task, that asks questions about a given evidence document, is a central problem in natural language understanding. Recent formulations of this task have typically focused on answer selection from a set of…

Computation and Language · Computer Science 2017-03-21 Kenton Lee , Shimi Salant , Tom Kwiatkowski , Ankur Parikh , Dipanjan Das , Jonathan Berant

Extracting Sentence Embeddings from Pretrained Transformer Models

Pre-trained transformer models shine in many natural language processing tasks and therefore are expected to bear the representation of the input sentence or text meaning. These sentence-level embeddings are also important in…

Computation and Language · Computer Science 2025-02-21 Lukas Stankevičius , Mantas Lukoševičius

Weighted Automata Extraction and Explanation of Recurrent Neural Networks for Natural Language Tasks

Recurrent Neural Networks (RNNs) have achieved tremendous success in processing sequential data, yet understanding and analyzing their behaviours remains a significant challenge. To this end, many efforts have been made to extract finite…

Computation and Language · Computer Science 2023-06-27 Zeming Wei , Xiyue Zhang , Yihao Zhang , Meng Sun

Salience Estimation with Multi-Attention Learning for Abstractive Text Summarization

Attention mechanism plays a dominant role in the sequence generation models and has been used to improve the performance of machine translation and abstractive text summarization. Different from neural machine translation, in the task of…

Computation and Language · Computer Science 2020-04-09 Piji Li , Lidong Bing , Zhongyu Wei , Wai Lam