Related papers: Grammar Compressed Sequences with Rank/Select Supp…

Grammar-Based Graph Compression

We present a new graph compressor that works by recursively detecting repeated substructures and representing them through grammar rules. We show that for a large number of graphs the compressor obtains smaller representations than other…

Data Structures and Algorithms · Computer Science 2017-04-19 Sebastian Maneth , Fabian Peternek

Optimal Lower and Upper Bounds for Representing Sequences

Sequence representations supporting queries $access$, $select$ and $rank$ are at the core of many data structures. There is a considerable gap between the various upper bounds and the few lower bounds known for such representations, and how…

Data Structures and Algorithms · Computer Science 2013-08-26 Djamal Belazzougui , Gonzalo Navarro

Practical Random Access to SLP-Compressed Texts

Grammar-based compression is a popular and powerful approach to compressing repetitive texts but until recently its relatively poor time-space trade-offs during real-life construction made it impractical for truly massive datasets such as…

Data Structures and Algorithms · Computer Science 2020-07-21 Travis Gagie , Tomohiro I , Giovanni Manzini , Gonzalo Navarro , Hiroshi Sakamoto , Louisa Seelbach Benkner , Yoshimasa Takabatake

Improved Grammar-Based Compressed Indexes

We introduce the first grammar-compressed representation of a sequence that supports searches in time that depends only logarithmically on the size of the grammar. Given a text $T[1..u]$ that is represented by a (context-free) grammar of…

Data Structures and Algorithms · Computer Science 2011-10-21 Francisco Claude , Gonzalo Navarro

Rank and select: Another lesson learned

Rank and select queries on bitmaps are essential building bricks of many compressed data structures, including text indexes, membership and range supporting spatial data structures, compressed graphs, and more. Theoretically considered yet…

Data Structures and Algorithms · Computer Science 2016-05-13 Szymon Grabowski , Marcin Raniszewski

Rank, select and access in grammar-compressed strings

Given a string $S$ of length $N$ on a fixed alphabet of $\sigma$ symbols, a grammar compressor produces a context-free grammar $G$ of size $n$ that generates $S$ and only $S$. In this paper we describe data structures to support the…

Data Structures and Algorithms · Computer Science 2014-08-15 Djamal Belazzougui , Simon J. Puglisi , Yasuo Tabei

Random Access to Grammar Compressed Strings

Grammar based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures many popular compression schemes. In this paper, we present a novel…

Data Structures and Algorithms · Computer Science 2013-10-30 Philip Bille , Gad M. Landau , Rajeev Raman , Kunihiko Sadakane , Srinivasa Rao Satti , Oren Weimann

Low-Rank Constraints for Fast Inference in Structured Models

Structured distributions, i.e. distributions over combinatorial spaces, are commonly used to learn latent probabilistic representations from observed data. However, scaling these models is bottlenecked by the high computational and memory…

Computation and Language · Computer Science 2022-01-11 Justin T. Chiu , Yuntian Deng , Alexander M. Rush

Engineering Rank/Select Data Structures for Large-Alphabet Strings

Large-alphabet strings are common in scenarios such as information retrieval and natural-language processing. The efficient storage and processing of such strings usually introduces several challenges that are not witnessed in…

Data Structures and Algorithms · Computer Science 2024-05-03 Diego Arroyuelo , Gabriel Carmona , Héctor Larrañaga , Francisco Riveros , Carlos Eugenio Rojas-Morales , Erick Sepúlveda

Compressed String Dictionaries

The problem of storing a set of strings --- a string dictionary --- in compact form appears naturally in many cases. While classically it has represented a small part of the whole data to be processed (e.g., for Natural Language processing…

Data Structures and Algorithms · Computer Science 2011-01-31 Nieves R. Brisaboa , Rodrigo Cánovas , Miguel A. Martínez-Prieto , Gonzalo Navarro

Traversing Grammar-Compressed Trees with Constant Delay

A grammar-compressed ranked tree is represented with a linear space overhead so that a single traversal step, i.e., the move to the parent or the i-th child, can be carried out in constant time. Moreover, we extend our data structure such…

Data Structures and Algorithms · Computer Science 2015-11-11 Markus Lohrey , Sebastian Maneth , Carl Philipp Reh

Compressed Indexing with Signature Grammars

The compressed indexing problem is to preprocess a string $S$ of length $n$ into a compressed representation that supports pattern matching queries. That is, given a string $P$ of length $m$ report all occurrences of $P$ in $S$. We present…

Data Structures and Algorithms · Computer Science 2018-04-12 Anders Roy Christiansen , Mikko Berggren Ettienne

Learning Directly from Grammar Compressed Text

Neural networks using numerous text data have been successfully applied to a variety of tasks. While massive text data is usually compressed using techniques such as grammar compression, almost all of the previous machine learning methods…

Machine Learning · Statistics 2020-03-02 Yoichi Sasaki , Kosuke Akimoto , Takanori Maehara

Improved Compressed String Dictionaries

We introduce a new family of compressed data structures to efficiently store and query large string dictionaries in main memory. Our main technique is a combination of hierarchical Front-coding with ideas from longest-common-prefix…

Data Structures and Algorithms · Computer Science 2019-11-20 Nieves R. Brisaboa , Ana Cerdeira-Pena , Guillermo de Bernardo , Gonzalo Navarro

Indexing Highly Repetitive String Collections

Two decades ago, a breakthrough in indexing string collections made it possible to represent them within their compressed space while at the same time offering indexed search functionalities. As this new technology permeated through…

Data Structures and Algorithms · Computer Science 2022-11-28 Gonzalo Navarro

Regular Expression Search on Compressed Text

We present an algorithm for searching regular expression matches in compressed text. The algorithm reports the number of matching lines in the uncompressed text in time linear in the size of its compressed version. We define efficient data…

Formal Languages and Automata Theory · Computer Science 2019-01-17 Pierre Ganty , Pedro Valero

Efficient Analysis of Complex Diagrams using Constraint-Based Parsing

This paper describes substantial advances in the analysis (parsing) of diagrams using constraint grammars. The addition of set types to the grammar and spatial indexing of the data make it possible to efficiently parse real diagrams of…

cmp-lg · Computer Science 2008-02-03 Robert P. Futrelle , Nikos Nikolakis

Grammar Index By Induced Suffix Sorting

Pattern matching is the most central task for text indices. Most recent indices leverage compression techniques to make pattern matching feasible for massive but highly-compressible datasets. Within this kind of indices, we propose a new…

Data Structures and Algorithms · Computer Science 2021-05-31 Tooru Akagi , Dominik Köppl , Yuto Nakashima , Shunsuke Inenaga , Hideo Bannai , Masayuki Takeda

Counting on General Run-Length Grammars

We introduce a data structure for counting pattern occurrences in texts compressed with any run-length context-free grammar. Our structure uses space proportional to the grammar size and counts the occurrences of a pattern of length $m$ in…

Data Structures and Algorithms · Computer Science 2025-01-30 Gonzalo Navarro , Alejandro Pacheco

Representing Sentences as Low-Rank Subspaces

Sentences are important semantic units of natural language. A generic, distributional representation of sentences that can capture the latent semantics is beneficial to multiple downstream applications. We observe a simple geometry of…

Computation and Language · Computer Science 2017-04-19 Jiaqi Mu , Suma Bhat , Pramod Viswanath