Related papers: String Inference from the LCP Array
The suffix tree is a very important data structure in string processing, but it suffers from a huge space consumption. In large-scale applications, compressed suffix trees (CSTs) are therefore used instead. A CST consists of three…
The longest common prefix array is a very advantageous data structure that, combined with the suffix array and the Burrows-Wheeler transform, allows to efficiently compute some combinatorial properties of a string useful in several…
When augmented with the longest common prefix (LCP) array and some other structures, the suffix array can solve many string processing problems in optimal time and space. A compressed representation of the LCP array is also one of the main…
The longest common prefix (LCP) array is a versatile auxiliary data structure in indexed string matching. It can be used to speed up searching using the suffix array (SA) and provides an implicit representation of the topology of an…
The LCP array is an important tool in stringology, allowing to speed up pattern matching algorithms and enabling compact representations of the suffix tree. Recently, Conte et al. [DCC 2023] and Cotumaccio et al. [SPIRE 2023] extended the…
Suffix arrays encode the lexicographical order of all suffixes of a text and are often combined with the Longest Common Prefix array (LCP) to simulate navigational queries on the suffix tree in reduced space. In space-critical applications…
We prove that longest common prefix (LCP) information can be stored in much less space than previously known. More precisely, we show that in the presence of the text and the suffix array, o(n) additional bits are sufficient to answer…
We show how to modify the linear-time construction algorithm for suffix arrays based on induced sorting (Nong et al., DCC'09) such that it computes the array of longest common prefixes (LCP-array) as well. Practical tests show that this…
Covers being one of the most popular form of regularities in strings, have drawn much attention over time. In this paper, we focus on the problem of linear time inference of strings from cover arrays using the least sized alphabet possible.…
Sparse suffix sorting is the problem of sorting $b=o(n)$ suffixes of a string of length $n$. Efficient sparse suffix sorting algorithms have existed for more than a decade. Despite the multitude of works and their justified claims for…
Suffix arrays and LCP arrays are one of the most fundamental data structures widely used for various kinds of string processing. We consider two problems for a read-only string of length $N$ over an integer alphabet $[1, \dots, \sigma]$ for…
Matching statistics were introduced to solve the approximate string matching problem, which is a recurrent subroutine in bioinformatics applications. In 2010, Ohlebusch et al. [SPIRE 2010] proposed a time and space efficient algorithm for…
The Lyndon array stores, at each position of a word, the length of the longest maximal Lyndon subword starting at that position, and plays an important role in combinatorics on words, for example in the construction of fundamental data…
Indexing very large collections of strings, such as those produced by the widespread next generation sequencing technologies, heavily relies on multistring generalization of the Burrows-Wheeler Transform (BWT): large requirements of…
Due to the increased availability of large datasets of biological sequences, the tools for sequence comparison are now relying on efficient alignment-free approaches to a greater extent. Most of the alignment-free approaches require the…
We present the first worst-case linear time algorithm that directly computes the parameterized suffix and LCP arrays for constant sized alphabets. Previous algorithms either required quadratic time or the parameterized suffix tree to be…
We consider an index data structure for similar strings. The generalized suffix tree can be a solution for this. The generalized suffix tree of two strings $A$ and $B$ is a compacted trie representing all suffixes in $A$ and $B$. It has…
Given a set of $k$ strings $I$, their longest common subsequence (LCS) is the string with the maximum length that is a subset of all the strings in $I$. A data-structure for this problem preprocesses $I$ into a data-structure such that the…
The k-spectrum of a string is the set of all distinct substrings of length k occurring in the string. K-spectra have many applications in bioinformatics including pseudoalignment and genome assembly. The Spectral Burrows-Wheeler Transform…
The advent of "next-generation" DNA sequencing (NGS) technologies has meant that collections of hundreds of millions of DNA sequences are now commonplace in bioinformatics. Knowing the longest common prefix array (LCP) of such a collection…