English
Related papers

Related papers: Spanner Evaluation over SLP-Compressed Documents

200 papers

We study the problem of enumerating results from a query over a compressed document. The model we use for compression are straight-line programs (SLPs), which are defined by a context-free grammar that produces a single string. For our…

Data Structures and Algorithms · Computer Science 2025-02-26 Martín Muñoz , Cristian Riveros

We study the problem of enumerating the answers to a query formulated in monadic second order logic (MSO) over an unranked forest F that is compressed by a straight-line program (SLP) D. Our main result states that this can be done after…

Formal Languages and Automata Theory · Computer Science 2026-03-17 Markus Lohrey , Markus L. Schmid

Regular expressions and automata models with capture variables are core tools in rule-based information extraction. These formalisms, also called regular document spanners, use regular languages in order to locate the data that a user wants…

Databases · Computer Science 2018-03-15 Fernando Florenzano , Cristian Riveros , Martin Ugarte , Stijn Vansummeren , Domagoj Vrgoc

We consider the information extraction framework known as document spanners, and study the problem of efficiently computing the results of the extraction from an input document, where the extraction task is described as a sequential…

Databases · Computer Science 2023-09-06 Antoine Amarilli , Pierre Bourhis , Stefan Mengel , Matthias Niewerth

We consider the information extraction framework known as document spanners, and study the problem of efficiently computing the results of the extraction from an input document, where the extraction task is described as a sequential…

Databases · Computer Science 2020-12-08 Antoine Amarilli , Pierre Bourhis , Stefan Mengel , Matthias Niewerth

Computation on compressed strings is one of the key approaches to processing massive data sets. We consider local subsequence recognition problems on strings compressed by straight-line programs (SLP), which is closely related to…

Data Structures and Algorithms · Computer Science 2011-11-10 Alexander Tiskin

In grammar-based compression a string is represented by a context-free grammar, also called a straight-line program (SLP), that generates only that string. We refine a recent balancing result stating that one can transform an SLP of size…

Data Structures and Algorithms · Computer Science 2021-07-02 Moses Ganardi

In this paper, a fully compressed pattern matching problem is studied. The compression is represented by straight-line programs (SLPs), i.e. a context-free grammars generating exactly one string; the term fully means that both the pattern…

Data Structures and Algorithms · Computer Science 2013-06-26 Artur Jeż

We study the compressed representation of a ranked tree by a (string) straight-line program (SLP) for its preorder traversal, and compare it with the well-studied representation by straight-line context free tree grammars (which are also…

Formal Languages and Automata Theory · Computer Science 2015-09-29 Moses Ganardi , Danny Hucke , Markus Lohrey , Eric Noeth

Here we study the complexity of string problems as a function of the size of a program that generates input. We consider straight-line programs (SLP), since all algorithms on SLP-generated strings could be applied to processing…

Data Structures and Algorithms · Computer Science 2007-05-23 Yury Lifshits

Information extraction from textual data, where the query is represented by a finite transducer and the task is to enumerate all results without repetition, and its extension to the weighted case, where each output element has a weight and…

Data Structures and Algorithms · Computer Science 2024-10-08 Pawel Gawrychowski , Florin Manea , Markus L. Schmid

We explore an extension to straight-line programs (SLPs) that outperforms, for some text families, the measure $\delta$ based on substring complexity, a lower bound for most measures and compressors exploiting repetitiveness (which are…

Data Structures and Algorithms · Computer Science 2024-02-16 Gonzalo Navarro , Cristian Urbina

Retrieving documents and prepending them in-context at inference time improves performance of language model (LMs) on a wide range of tasks. However, these documents, often spanning hundreds of words, make inference substantially more…

Computation and Language · Computer Science 2023-10-09 Fangyuan Xu , Weijia Shi , Eunsol Choi

We present an algorithm for computing the Lyndon factorization of a string that is given in grammar compressed form, namely, a Straight Line Program (SLP). The algorithm runs in $O(n^4 + mn^3h)$ time and $O(n^2)$ space, where $m$ is the…

Data Structures and Algorithms · Computer Science 2013-04-29 Tomohiro I , Yuto Nakashima , Shunsuke Inenaga , Hideo Bannai , Masayuki Takeda

Transformer-based large language models (LLMs) are comprised of billions of parameters arranged in deep and wide computational graphs. Several studies on LLM efficiency optimization argue that it is possible to prune a significant portion…

Computation and Language · Computer Science 2026-04-16 Corentin Kervadec , Iuliia Lysova , Marco Baroni , Gemma Boleda

A document spanner models a program for Information Extraction (IE) as a function that takes as input a text document (string over a finite alphabet) and produces a relation of spans (intervals in the document) over a predefined schema. A…

Databases · Computer Science 2018-05-24 Liat Peterfreund , Balder ten Cate , Ronald Fagin , Benny Kimelfeld

While Transformer language models (LMs) are state-of-the-art for information extraction, long text introduces computational challenges requiring suboptimal preprocessing steps or alternative model architectures. Sparse attention LMs can…

Computation and Language · Computer Science 2022-12-01 Joel Stremmel , Brian L. Hill , Jeffrey Hertzberg , Jaime Murillo , Llewelyn Allotey , Eran Halperin

Scan-based operations, such as backstage compaction and value filtering, have emerged as the main bottleneck for LSM-Trees in supporting contemporary data-intensive applications. For slower external storage devices, such as HDD and SATA…

Databases · Computer Science 2025-08-19 Jianfeng Huang , Ziyao Wang , Lin Yuan , Jiajie Wen , Yihao Cao , Dongjing Miao , Yong Wang , Jiahao Zhang

As large language models (LLMs) continue to evolve, efficient evaluation metrics are vital for assessing their ability to compress information and reduce redundancy. While traditional metrics like Matrix Entropy offer valuable insights,…

Computation and Language · Computer Science 2025-06-04 Yahan Li , Tingyu Xia , Yi Chang , Yuan Wu

Can we analyze data without decompressing it? As our data keeps growing, understanding the time complexity of problems on compressed inputs, rather than in convenient uncompressed forms, becomes more and more relevant. Suppose we are given…

Computational Complexity · Computer Science 2018-03-05 Amir Abboud , Arturs Backurs , Karl Bringmann , Marvin Künnemann
‹ Prev 1 2 3 10 Next ›