Related papers: Closed Repeats
Repeat finding in strings has important applications in subfields such as computational biology. Surprisingly, all prior work on repeat finding did not consider the constraint on the locality of repeats. In this paper, we propose and study…
In this paper we initiate the study of computing a maximal (not necessarily maximum) repeating pattern in a single input string, where the corresponding problems have been studied (e.g., a maximal common subsequence) only in two or more…
A string is closed if it has length 1 or has a nonempty border without internal occurrences. In this paper we introduce the definition of a \emph{maximal closed substring} (MCS), which is an occurrence of a closed substring that cannot be…
Motivated by computing duplication patterns in sequences, a new fundamental problem called the longest subsequence-repeated subsequence (LSRS) is proposed. Given a sequence $S$ of length $n$, a letter-repeated subsequence is a subsequence…
The cornerstone of any algorithm computing all repetitions in a string of length n in O(n) time is the fact that the number of runs (or maximal repetitions) is O(n). We give a simple proof of this result. As a consequence of our approach,…
Following (Kolpakov et al., 2013; Gawrychowski and Manea, 2015), we continue the study of {\em $\alpha$-gapped repeats} in strings, defined as factors $uvu$ with $|uv|\leq \alpha |u|$. Our main result is the $O(\alpha n)$ bound on the…
A longest repeat query on a string, motivated by its applications in many subfields including computational biology, asks for the longest repetitive substring(s) covering a particular string position (point query). In this paper, we extend…
We give a new characterization of maximal repetitions (or runs) in strings based on Lyndon words. The characterization leads to a proof of what was known as the "runs" conjecture (Kolpakov \& Kucherov (FOCS '99)), which states that the…
In this paper we study the fundamental problem of maintaining a dynamic collection of strings under the following operations: concat - concatenates two strings, split - splits a string into two at a given position, compare - finds the…
A maximal repetition, or run, in a string, is a maximal periodic substring whose smallest period is at most half the length of the substring. In this paper, we consider runs that correspond to a path on a trie, or in other words, on a…
A gapped repeat is a factor of the form $uvu$ where $u$ and $v$ are nonempty words. The period of the gapped repeat is defined as $|u|+|v|$. The gapped repeat is maximal if it cannot be extended to the left or to the right by at least one…
The classic string indexing problem is to preprocess a string $S$ into a compact data structure that supports efficient subsequent pattern matching queries, that is, given a pattern string $P$, report all occurrences of $P$ within $S$. In…
Repeat finding in strings has important applications in subfields such as computational biology. The challenge of finding the longest repeats covering particular string positions was recently proposed and solved by \.{I}leri et al., using a…
We solve the problems of detecting and counting various forms of regularities in a string represented as a Straight Line Program (SLP). Given an SLP of size $n$ that represents a string $s$ of length $N$, our algorithm compute all runs and…
This paper provides an upper bound for several subsets of maximal repeats and maximal pairs in compressed strings and also presents a formerly unknown relationship between maximal pairs and the run-length Burrows-Wheeler transform. This…
A closed string $u$ is either of length one or contains a border that occurs only as a prefix and as a suffix in $u$ and nowhere else within $u$. In this paper, we present fast $\mathcal{O}(n\log n)$ time algorithms to compute all…
An occurrence of a repeated substring $u$ in a string $S$ is called a net occurrence if extending the occurrence to the left or to the right decreases the number of occurrences to 1. The net frequency (NF) of a repeated substring $u$ in a…
The classic string indexing problem is to preprocess a string S into a compact data structure that supports efficient pattern matching queries. Typical queries include existential queries (decide if the pattern occurs in S), reporting…
We study the problem of computing a longest increasing subsequence in a sequence $S$ of $n$ distinct elements in the presence of persistent comparison errors. In this model, every comparison between two elements can return the wrong result…
The Karp-Rabin fingerprint of a string is a type of hash value that due to its strong properties has been used in many string algorithms. In this paper we show how to construct a data structure for a string $S$ of size $N$ compressed by a…