Related papers: Sampled Longest Common Prefix Array

Lightweight LCP-Array Construction in Linear Time

The suffix tree is a very important data structure in string processing, but it suffers from a huge space consumption. In large-scale applications, compressed suffix trees (CSTs) are therefore used instead. A CST consists of three…

Data Structures and Algorithms · Computer Science 2010-12-21 Simon Gog , Enno Ohlebusch

Wee LCP

We prove that longest common prefix (LCP) information can be stored in much less space than previously known. More precisely, we show that in the presence of the text and the suffix array, o(n) additional bits are sufficient to answer…

Data Structures and Algorithms · Computer Science 2010-02-19 Johannes Fischer

Low Space External Memory Construction of the Succinct Permuted Longest Common Prefix Array

The longest common prefix (LCP) array is a versatile auxiliary data structure in indexed string matching. It can be used to speed up searching using the suffix array (SA) and provides an implicit representation of the topology of an…

Data Structures and Algorithms · Computer Science 2016-03-09 German Tischler

In-Place Sparse Suffix Sorting

Suffix arrays encode the lexicographical order of all suffixes of a text and are often combined with the Longest Common Prefix array (LCP) to simulate navigational queries on the suffix tree in reduced space. In space-critical applications…

Data Structures and Algorithms · Computer Science 2017-11-02 Nicola Prezza

String Inference from the LCP Array

The suffix array, perhaps the most important data structure in modern string processing, is often augmented with the longest common prefix (LCP) array which stores the lengths of the LCPs for lexicographically adjacent suffixes of a string.…

Data Structures and Algorithms · Computer Science 2017-02-27 Juha Kärkkäinen , Marcin Piątkowski , Simon J. Puglisi

Compressed Spaced Suffix Arrays

Spaced seeds are important tools for similarity search in bioinformatics, and using several seeds together often significantly improves their performance. With existing approaches, however, for each seed we keep a separate linear-size data…

Data Structures and Algorithms · Computer Science 2014-03-11 Travis Gagie , Giovanni Manzini , Daniel Valenzuela

Lightweight LCP Construction for Very Large Collections of Strings

The longest common prefix array is a very advantageous data structure that, combined with the suffix array and the Burrows-Wheeler transform, allows to efficiently compute some combinatorial properties of a string useful in several…

Data Structures and Algorithms · Computer Science 2016-05-16 Anthony J. Cox , Fabio Garofalo , Giovanna Rosone , Marinella Sciortino

Inducing the LCP-Array

We show how to modify the linear-time construction algorithm for suffix arrays based on induced sorting (Nong et al., DCC'09) such that it computes the array of longest common prefixes (LCP-array) as well. Practical tests show that this…

Data Structures and Algorithms · Computer Science 2011-01-19 Johannes Fischer

The colored longest common prefix array computed via sequential scans

Due to the increased availability of large datasets of biological sequences, the tools for sequence comparison are now relying on efficient alignment-free approaches to a greater extent. Most of the alignment-free approaches require the…

Data Structures and Algorithms · Computer Science 2018-07-23 F. Garofalo , G. Rosone , M. Sciortino , D. Verzotto

Relative Suffix Trees

Suffix trees are one of the most versatile data structures in stringology, with many applications in bioinformatics. Their main drawback is their size, which can be tens of times larger than the input sequence. Much effort has been put into…

Data Structures and Algorithms · Computer Science 2017-12-18 Andrea Farruggia , Travis Gagie , Gonzalo Navarro , Simon J. Puglisi , Jouni Sirén

Sampling the suffix array with minimizers

Sampling (evenly) the suffixes from the suffix array is an old idea trading the pattern search time for reduced index space. A few years ago Claude et al. showed an alphabet sampling scheme allowing for more efficient pattern searches…

Data Structures and Algorithms · Computer Science 2014-12-04 Szymon Grabowski , Marcin Raniszewski

Suffixient Arrays: a New Efficient Suffix Array Compression Technique

The Suffix Array is a classic text index enabling on-line pattern matching queries via simple binary search. The main drawback of the Suffix Array is that it takes linear space in the text's length, even if the text itself is extremely…

Data Structures and Algorithms · Computer Science 2025-03-19 Davide Cenzato , Lore Depuydt , Travis Gagie , Sung-Hwan Kim , Giovanni Manzini , Francisco Olivares , Nicola Prezza

Computing the LCP Array of a Labeled Graph

The LCP array is an important tool in stringology, allowing to speed up pattern matching algorithms and enabling compact representations of the suffix tree. Recently, Conte et al. [DCC 2023] and Cotumaccio et al. [SPIRE 2023] extended the…

Data Structures and Algorithms · Computer Science 2024-04-23 Jarno Alanko , Davide Cenzato , Nicola Cotumaccio , Sung-Hwan Kim , Giovanni Manzini , Nicola Prezza

Computing matching statistics on Wheeler DFAs

Matching statistics were introduced to solve the approximate string matching problem, which is a recurrent subroutine in bioinformatics applications. In 2010, Ohlebusch et al. [SPIRE 2010] proposed a time and space efficient algorithm for…

Data Structures and Algorithms · Computer Science 2023-01-16 Alessio Conte , Nicola Cotumaccio , Travis Gagie , Giovanni Manzini , Nicola Prezza , Marinella Sciortino

Time and Memory Efficient Lempel-Ziv Compression Using Suffix Arrays

The well-known dictionary-based algorithms of the Lempel-Ziv (LZ) 77 family are the basis of several universal lossless compression techniques. These algorithms are asymmetric regarding encoding/decoding time and memory requirements, with…

Data Structures and Algorithms · Computer Science 2009-12-31 Artur Ferreira , Arlindo Oliveira , Mario Figueiredo

Sparse Suffix and LCP Array: Simple, Direct, Small, and Fast

Sparse suffix sorting is the problem of sorting $b=o(n)$ suffixes of a string of length $n$. Efficient sparse suffix sorting algorithms have existed for more than a decade. Despite the multitude of works and their justified claims for…

Data Structures and Algorithms · Computer Science 2024-07-08 Lorraine A. K. Ayad , Grigorios Loukides , Solon P. Pissis , Hilde Verbeek

Using Compressed Suffix-Arrays for a Compact Representation of Temporal-Graphs

Temporal graphs represent binary relationships that change along time. They can model the dynamism of, for example, social and communication networks. Temporal graphs are defined as sets of contacts that are edges tagged with the temporal…

Data Structures and Algorithms · Computer Science 2019-01-01 Nieves R. Brisaboa , Diego Caro , Antonio Fariña , M. Andrea Rodriguez

Longest Common Extensions with Recompression

Given two positions $i$ and $j$ in a string $T$ of length $N$, a longest common extension (LCE) query asks for the length of the longest common prefix between suffixes beginning at $i$ and $j$. A compressed LCE data structure is a data…

Data Structures and Algorithms · Computer Science 2016-11-22 Tomohiro I

Faster run-length compressed suffix arrays

We first review how we can store a run-length compressed suffix array (RLCSA) for a text $T$ of length $n$ over an alphabet of size $\sigma$ whose Burrows-Wheeler Transform (BWT) consists of $r$ runs in $O \left( \rule{0ex}{2ex} r \log (n /…

Data Structures and Algorithms · Computer Science 2025-04-22 Nathaniel K. Brown , Travis Gagie , Giovanni Manzini , Gonzalo Navarro , Marinella Sciortino

Parallel Suffix Array Construction by Accelerated Sampling

A deterministic BSP algorithm for constructing the suffix array of a given string is presented, based on a technique which we call accelerated sampling. It runs in optimal O(n/p) local computation and communication, and requires a near…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-02-26 Matthew Felice Pace , Alexander Tiskin