Related papers: Compressed Indexing for Consecutive Occurrences

Gapped Indexing for Consecutive Occurrences

The classic string indexing problem is to preprocess a string S into a compact data structure that supports efficient pattern matching queries. Typical queries include existential queries (decide if the pattern occurs in S), reporting…

Data Structures and Algorithms · Computer Science 2021-02-05 Philip Bille , Inge Li Gørtz , Max Rishøj Pedersen , Teresa Anna Steiner

Data Structures for Range Sorted Consecutive Occurrence Queries

The string indexing problem is a fundamental computational problem with numerous applications, including information retrieval and bioinformatics. It aims to efficiently solve the pattern matching problem: given a text T of length n for…

Data Structures and Algorithms · Computer Science 2025-09-03 Waseem Akram , Takuya Mieno

Compressed Indexing with Signature Grammars

The compressed indexing problem is to preprocess a string $S$ of length $n$ into a compressed representation that supports pattern matching queries. That is, given a string $P$ of length $m$ report all occurrences of $P$ in $S$. We present…

Data Structures and Algorithms · Computer Science 2018-04-12 Anders Roy Christiansen , Mikko Berggren Ettienne

String Indexing for Top-$k$ Close Consecutive Occurrences

The classic string indexing problem is to preprocess a string $S$ into a compact data structure that supports efficient subsequent pattern matching queries, that is, given a pattern string $P$, report all occurrences of $P$ within $S$. In…

Data Structures and Algorithms · Computer Science 2024-02-15 Philip Bille , Inge Li Gørtz , Max Rishøj Pedersen , Eva Rotenberg , Teresa Anna Steiner

Gapped String Indexing in Subquadratic Space and Sublinear Query Time

In Gapped String Indexing, the goal is to compactly represent a string $S$ of length $n$ such that for any query consisting of two strings $P_1$ and $P_2$, called patterns, and an integer interval $[\alpha, \beta]$, called gap range, we can…

Data Structures and Algorithms · Computer Science 2024-03-06 Philip Bille , Inge Li Gørtz , Moshe Lewenstein , Solon P. Pissis , Eva Rotenberg , Teresa Anna Steiner

Optimal Random Access and Conditional Lower Bounds for 2D Compressed Strings

Compressed indexing is a powerful technique that enables efficient querying over data stored in compressed form, significantly reducing memory usage and often accelerating computation. While extensive progress has been made for…

Data Structures and Algorithms · Computer Science 2025-10-23 Rajat De , Dominik Kempa

Solving Classical String Problems on Compressed Texts

Here we study the complexity of string problems as a function of the size of a program that generates input. We consider straight-line programs (SLP), since all algorithms on SLP-generated strings could be applied to processing…

Data Structures and Algorithms · Computer Science 2007-05-23 Yury Lifshits

Contextual Pattern Matching

The research on indexing repetitive string collections has focused on the same search problems used for regular string collections, though they can make little sense in this scenario. For example, the basic pattern matching query "list all…

Data Structures and Algorithms · Computer Science 2020-10-15 Gonzalo Navarro

String Indexing with Compressed Patterns

Given a string $S$ of length $n$, the classic string indexing problem is to preprocess $S$ into a compact data structure that supports efficient subsequent pattern queries. In this paper we consider the basic variant where the pattern is…

Data Structures and Algorithms · Computer Science 2024-02-15 Philip Bille , Inge Li Gørtz , Teresa Anna Steiner

Indexing Highly Repetitive String Collections

Two decades ago, a breakthrough in indexing string collections made it possible to represent them within their compressed space while at the same time offering indexed search functionalities. As this new technology permeated through…

Data Structures and Algorithms · Computer Science 2022-11-28 Gonzalo Navarro

The Complexity of Maximal Common Subsequence Enumeration

Frequent pattern mining is widely used to find ``important'' or ``interesting'' patterns in data. While it is not easy to mathematically define such patterns, maximal frequent patterns are promising candidates, as frequency is a natural…

Data Structures and Algorithms · Computer Science 2025-04-08 Giovanni Buzzega , Alessio Conte , Yasuaki Kobayashi , Kazuhiro Kurita , Giulia Punzi

Compressed Dictionary Matching on Run-Length Encoded Strings

Given a set of pattern strings $\mathcal{P}=\{P_1, P_2,\ldots P_k\}$ and a text string $S$, the classic dictionary matching problem is to report all occurrences of each pattern in $S$. We study the dictionary problem in the compressed…

Data Structures and Algorithms · Computer Science 2025-09-04 Philip Bille , Inge Li Gørtz , Simon J. Puglisi , Simon R. Tarnow

The Wavelet Trie: Maintaining an Indexed Sequence of Strings in Compressed Space

An indexed sequence of strings is a data structure for storing a string sequence that supports random access, searching, range counting and analytics operations, both for exact matches and prefix search. String sequences lie at the core of…

Data Structures and Algorithms · Computer Science 2012-04-17 Roberto Grossi , Giuseppe Ottaviano

A Faster Grammar-Based Self-Index

To store and search genomic databases efficiently, researchers have recently started building compressed self-indexes based on grammars. In this paper we show how, given a straight-line program with $r$ rules for a string (S [1..n]) whose…

Data Structures and Algorithms · Computer Science 2012-09-28 Travis Gagie , Paweł Gawrychowski , Juha Kärkkäinen , Yakov Nekrich , Simon J. Puglisi

Small Longest Tandem Scattered Subsequences

We consider the problem of identifying tandem scattered subsequences within a string. Our algorithm identifies a longest subsequence which occurs twice without overlap in a string. This algorithm is based on the Hunt-Szymanski algorithm,…

Data Structures and Algorithms · Computer Science 2020-06-26 Luís M. S. Russo , Alexandre P. Francisco

Grammar Index By Induced Suffix Sorting

Pattern matching is the most central task for text indices. Most recent indices leverage compression techniques to make pattern matching feasible for massive but highly-compressible datasets. Within this kind of indices, we propose a new…

Data Structures and Algorithms · Computer Science 2021-05-31 Tooru Akagi , Dominik Köppl , Yuto Nakashima , Shunsuke Inenaga , Hideo Bannai , Masayuki Takeda

Pattern Matching on Grammar-Compressed Strings in Linear Time

The most fundamental problem considered in algorithms for text processing is pattern matching: given a pattern $p$ of length $m$ and a text $t$ of length $n$, does $p$ occur in $t$? Multiple versions of this basic question have been…

Data Structures and Algorithms · Computer Science 2021-11-10 Moses Ganardi , Paweł Gawrychowski

NP-Completeness for the Space-Optimality of Double-Array Tries

Indexing a set of strings for prefix search or membership queries is a fundamental task with many applications such as information retrieval or database systems. A classic abstract data type for modelling such an index is a trie. Due to the…

Data Structures and Algorithms · Computer Science 2024-03-11 Hideo Bannai , Keisuke Goto , Shunsuke Kanda , Dominik Köppl

Quasi-Succinct Indices

Compressed inverted indices in use today are based on the idea of gap compression: documents pointers are stored in increasing order, and the gaps between successive document pointers are stored using suitable codes which represent smaller…

Information Retrieval · Computer Science 2012-06-20 Sebastiano Vigna

Subpath Queries on Compressed Graphs: a Survey

Text indexing is a classical algorithmic problem that has been studied for over four decades: given a text $T$, pre-process it off-line so that, later, we can quickly count and locate the occurrences of any string (the query pattern) in $T$…

Data Structures and Algorithms · Computer Science 2020-12-15 Nicola Prezza