Related papers: Spanner Evaluation over SLP-Compressed Documents

Constant-delay enumeration for SLP-compressed documents

We study the problem of enumerating results from a query over a compressed document. The model we use for compression are straight-line programs (SLPs), which are defined by a context-free grammar that produces a single string. For our…

Data Structures and Algorithms · Computer Science 2025-02-26 Martín Muñoz , Cristian Riveros

MSO-Enumeration Over SLP-Compressed Unranked Forests

We study the problem of enumerating the answers to a query formulated in monadic second order logic (MSO) over an unranked forest F that is compressed by a straight-line program (SLP) D. Our main result states that this can be done after…

Formal Languages and Automata Theory · Computer Science 2026-03-17 Markus Lohrey , Markus L. Schmid

Constant delay algorithms for regular document spanners

Regular expressions and automata models with capture variables are core tools in rule-based information extraction. These formalisms, also called regular document spanners, use regular languages in order to locate the data that a user wants…

Databases · Computer Science 2018-03-15 Fernando Florenzano , Cristian Riveros , Martin Ugarte , Stijn Vansummeren , Domagoj Vrgoc

Constant-Delay Enumeration for Nondeterministic Document Spanners

We consider the information extraction framework known as document spanners, and study the problem of efficiently computing the results of the extraction from an input document, where the extraction task is described as a sequential…

Databases · Computer Science 2023-09-06 Antoine Amarilli , Pierre Bourhis , Stefan Mengel , Matthias Niewerth

Constant-Delay Enumeration for Nondeterministic Document Spanners

We consider the information extraction framework known as document spanners, and study the problem of efficiently computing the results of the extraction from an input document, where the extraction task is described as a sequential…

Databases · Computer Science 2020-12-08 Antoine Amarilli , Pierre Bourhis , Stefan Mengel , Matthias Niewerth

Faster subsequence recognition in compressed strings

Computation on compressed strings is one of the key approaches to processing massive data sets. We consider local subsequence recognition problems on strings compressed by straight-line programs (SLP), which is closely related to…

Data Structures and Algorithms · Computer Science 2011-11-10 Alexander Tiskin

Compression by Contracting Straight-Line Programs

In grammar-based compression a string is represented by a context-free grammar, also called a straight-line program (SLP), that generates only that string. We refine a recent balancing result stating that one can transform an SLP of size…

Data Structures and Algorithms · Computer Science 2021-07-02 Moses Ganardi

Faster fully compressed pattern matching by recompression

In this paper, a fully compressed pattern matching problem is studied. The compression is represented by straight-line programs (SLPs), i.e. a context-free grammars generating exactly one string; the term fully means that both the pattern…

Data Structures and Algorithms · Computer Science 2013-06-26 Artur Jeż

Tree compression using string grammars

We study the compressed representation of a ranked tree by a (string) straight-line program (SLP) for its preorder traversal, and compare it with the well-studied representation by straight-line context free tree grammars (which are also…

Formal Languages and Automata Theory · Computer Science 2015-09-29 Moses Ganardi , Danny Hucke , Markus Lohrey , Eric Noeth

Solving Classical String Problems on Compressed Texts

Here we study the complexity of string problems as a function of the size of a program that generates input. We consider straight-line programs (SLP), since all algorithms on SLP-generated strings could be applied to processing…

Data Structures and Algorithms · Computer Science 2007-05-23 Yury Lifshits

Revisiting Weighted Information Extraction: A Simpler and Faster Algorithm for Ranked Enumeration

Information extraction from textual data, where the query is represented by a finite transducer and the task is to enumerate all results without repetition, and its extension to the weighted case, where each output element has a weight and…

Data Structures and Algorithms · Computer Science 2024-10-08 Pawel Gawrychowski , Florin Manea , Markus L. Schmid

Iterated Straight-Line Programs

We explore an extension to straight-line programs (SLPs) that outperforms, for some text families, the measure $\delta$ based on substring complexity, a lower bound for most measures and compressors exploiting repetitiveness (which are…

Data Structures and Algorithms · Computer Science 2024-02-16 Gonzalo Navarro , Cristian Urbina

RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation

Retrieving documents and prepending them in-context at inference time improves performance of language model (LMs) on a wide range of tasks. However, these documents, often spanning hundreds of words, make inference substantially more…

Computation and Language · Computer Science 2023-10-09 Fangyuan Xu , Weijia Shi , Eunsol Choi

Efficient Lyndon factorization of grammar compressed text

We present an algorithm for computing the Lyndon factorization of a string that is given in grammar compressed form, namely, a Straight Line Program (SLP). The algorithm runs in $O(n^4 + mn^3h)$ time and $O(n^2)$ space, where $m$ is the…

Data Structures and Algorithms · Computer Science 2013-04-29 Tomohiro I , Yuto Nakashima , Shunsuke Inenaga , Hideo Bannai , Masayuki Takeda

Sparse or Dense? A Mechanistic Estimation of Computation Density in Transformer-based LLMs

Transformer-based large language models (LLMs) are comprised of billions of parameters arranged in deep and wide computational graphs. Several studies on LLM efficiency optimization argue that it is possible to prune a significant portion…

Computation and Language · Computer Science 2026-04-16 Corentin Kervadec , Iuliia Lysova , Marco Baroni , Gemma Boleda

Recursive Programs for Document Spanners

A document spanner models a program for Information Extraction (IE) as a function that takes as input a text document (string over a finite alphabet) and produces a relation of spans (intervals in the document) over a predefined schema. A…

Databases · Computer Science 2018-05-24 Liat Peterfreund , Balder ten Cate , Ronald Fagin , Benny Kimelfeld

Extend and Explain: Interpreting Very Long Language Models

While Transformer language models (LMs) are state-of-the-art for information extraction, long text introduces computational challenges requiring suboptimal preprocessing steps or alternative model architectures. Sparse attention LMs can…

Computation and Language · Computer Science 2022-12-01 Joel Stremmel , Brian L. Hill , Jeffrey Hertzberg , Jaime Murillo , Llewelyn Allotey , Eran Halperin

LSM-OPD: Boosting Scan in LSM-Trees by Enabling Direct Computing on Compressed Data

Scan-based operations, such as backstage compaction and value filtering, have emerged as the main bottleneck for LSM-Trees in supporting contemporary data-intensive applications. For slower external storage devices, such as HDD and SATA…

Databases · Computer Science 2025-08-19 Jianfeng Huang , Ziyao Wang , Lin Yuan , Jiajie Wen , Yihao Cao , Dongjing Miao , Yong Wang , Jiahao Zhang

Large Language Model Evaluation via Matrix Nuclear-Norm

As large language models (LLMs) continue to evolve, efficient evaluation metrics are vital for assessing their ability to compress information and reduce redundancy. While traditional metrics like Matrix Entropy offer valuable insights,…

Computation and Language · Computer Science 2025-06-04 Yahan Li , Tingyu Xia , Yi Chang , Yuan Wu

Fine-Grained Complexity of Analyzing Compressed Data: Quantifying Improvements over Decompress-And-Solve

Can we analyze data without decompressing it? As our data keeps growing, understanding the time complexity of problems on compressed inputs, rather than in convenient uncompressed forms, becomes more and more relevant. Suppose we are given…

Computational Complexity · Computer Science 2018-03-05 Amir Abboud , Arturs Backurs , Karl Bringmann , Marvin Künnemann