Related papers: Linear Algorithms for Computing the Lyndon Border …
The Lyndon array stores, at each position of a word, the length of the longest maximal Lyndon subword starting at that position, and plays an important role in combinatorics on words, for example in the construction of fundamental data…
In this paper we propose a variant of the induced suffix sorting algorithm by Nong (TOIS, 2013) that computes simultaneously the Lyndon array and the suffix array of a text in $O(n)$ time using $\sigma + O(1)$ words of working space, where…
We first describe three algorithms for computing the Lyndon array that have been suggested in the literature, but for which no structured exposition has been given. Two of these algorithms execute in quadratic time in the worst case, the…
We present the first linear time algorithm to construct the $2n$-bit version of the Lyndon array for a string of length $n$ using only $o(n)$ bits of working space. A simpler variant of this algorithm computes the plain ($n\lg n$-bit)…
For a text given in advance, the substring minimal suffix queries ask to determine the lexicographically minimal non-empty suffix of a substring specified by the location of its occurrence in the text. We develop a data structure answering…
A border of a string is a non-empty proper prefix of the string that is also a suffix. A string is unbordered if it has no border. The longest unbordered factor is a fundamental notion in stringology, closely related to string periodicity.…
A Lyndon word is a primitive string which is lexicographically smallest among cyclic permutations of its characters. Lyndon words are used for constructing bases in free Lie algebras, constructing de Bruijn sequences, finding the…
We study the fundamental question of how efficiently suffix array entries can be accessed when the array cannot be stored explicitly. The suffix array $SA_T[1..n]$ of a text $T$ of length $n$ encodes the lexicographic order of its suffixes…
A \itbf{cover} of a string $x = x[1..n]$ is a proper substring $u$ of $x$ such that $x$ can be constructed from possibly overlapping instances of $u$. A recent paper \cite{FIKPPST13} relaxes this definition --- an \itbf{enhanced cover} $u$…
A seed in a word is a relaxed version of a period in which the occurrences of the repeating subword may overlap. We show a linear-time algorithm computing a linear-size representation of all the seeds of a word (the number of seeds might be…
The Suffix Array is a classic text index enabling on-line pattern matching queries via simple binary search. The main drawback of the Suffix Array is that it takes linear space in the text's length, even if the text itself is extremely…
An absent word of a word y of length n is a word that does not occur in y. It is a minimal absent word if all its proper factors occur in y. Minimal absent words have been computed in genomes of organisms from all domains of life; their…
Given two strings $T$ and $S$ and a set of strings $P$, for each string $p \in P$, consider the unique substrings of $T$ that have $p$ as their prefix and $S$ as their suffix. Two problems then come to mind; the first problem being the…
The suffix array is a fundamental data structure for many applications that involve string searching and data compression. Designing time/space-efficient suffix array construction algorithms has attracted significant attention and…
In this paper we present an algorithm to compute the Lyndon array of a string $T$ of length $n$ as a byproduct of the inversion of the Burrows-Wheeler transform of $T$. Our algorithm runs in linear time using only a stack in addition to the…
The notion of Lyndon word and Lyndon factorization has shown to have unexpected applications in theory as well in developing novel algorithms on words. A counterpart to these notions are those of inverse Lyndon word and inverse Lyndon…
The suffix trees are fundamental data structures for various kinds of string processing. The suffix tree of a text string $T$ of length $n$ has $O(n)$ nodes and edges, and the string label of each edge is encoded by a pair of positions in…
Although real-world text datasets, such as DNA sequences, are far from being uniformly random, average-case string searching algorithms perform significantly better than worst-case ones in most applications of interest. In this paper, we…
The Longest Common Subsequence (LCS) is a fundamental string similarity measure, and computing the LCS of two strings is a classic algorithms question. A textbook dynamic programming algorithm gives an exact algorithm in quadratic time, and…
This paper investigates the approximability of the Longest Common Subsequence (LCS) problem. The fastest algorithm for solving the LCS problem exactly runs in essentially quadratic time in the length of the input, and it is known that under…