Related papers: A Linear Time Algorithm for Seeds Computation
The notion of the cover is a generalization of a period of a string, and there are linear time algorithms for finding the shortest cover. The seed is a more complicated generalization of periodicity, it is a cover of a superstring of a…
The Cover Suffix Tree (CST) of a string $T$ is the suffix tree of $T$ with additional explicit nodes corresponding to halves of square substrings of $T$. In the CST an explicit node corresponding to a substring $C$ of $T$ is annotated with…
An absent word of a word y of length n is a word that does not occur in y. It is a minimal absent word if all its proper factors occur in y. Minimal absent words have been computed in genomes of organisms from all domains of life; their…
We consider here the problem of chaining seeds in ordered trees. Seeds are mappings between two trees Q and T and a chain is a subset of non overlapping seeds that is consistent with respect to postfix order and ancestrality. This problem…
Given two strings $T$ and $S$ and a set of strings $P$, for each string $p \in P$, consider the unique substrings of $T$ that have $p$ as their prefix and $S$ as their suffix. Two problems then come to mind; the first problem being the…
This paper addresses the problem of finding a representation of a subtree distance, which is an extension of the tree metric. We show that a minimal representation is uniquely determined by a given subtree distance, and give a linear time…
Suffix trees are key and efficient data structure for solving string problems. A suffix tree is a compressed trie containing all the suffixes of a given text of length $n$ with a linear construction cost. In this work, we introduce an…
We consider the problem of finding repetitive structures and inherent patterns in a given string $\s{s}$ of length $n$ over a finite totally ordered alphabet. A border $\s{u}$ of a string $\s{s}$ is both a prefix and a suffix of $\s{s}$…
Covers being one of the most popular form of regularities in strings, have drawn much attention over time. In this paper, we focus on the problem of linear time inference of strings from cover arrays using the least sized alphabet possible.…
For a text given in advance, the substring minimal suffix queries ask to determine the lexicographically minimal non-empty suffix of a substring specified by the location of its occurrence in the text. We develop a data structure answering…
The linear complexity of a sequence $s$ is one of the measures of its predictability. It represents the smallest degree of a linear recursion which the sequence satisfies. There are several algorithms to find the linear complexity of a…
Tree kernels have been proposed to be used in many areas as the automatic learning of natural language applications. In this paper, we propose a new linear time algorithm based on the concept of weighted tree automata for SubTree kernel…
We present a new algorithm for iterating over all permutations of a sequence. The algorithm leverages elementary~$O(1)$ operations on recursive lists. As a result, no new nodes are allocated during the computation. Instead, all elements are…
We develop a combinatorial approach to the study of semigroups and monoids with finite presentations satisfying small overlap conditions. In contrast to existing geometric methods, our approach facilitates a sequential left-right analysis…
We show that the number of distinct squares in a packed string of length $n$ over an alphabet of size $\sigma$ can be computed in $O(n/\log_\sigma n)$ time in the word-RAM model. This paper is the first to introduce a sublinear-time…
Packing several characters into one computer word is a simple and natural way to compress the representation of a string and to speed up its processing. Exploiting this idea, we propose an index for a packed string, based on a {\em sparse…
We present algorithms that run in linear time on pointer machines for a collection of problems, each of which either directly or indirectly requires the evaluation of a function defined on paths in a tree. These problems previously had…
A Lyndon word is a primitive string which is lexicographically smallest among cyclic permutations of its characters. Lyndon words are used for constructing bases in free Lie algebras, constructing de Bruijn sequences, finding the…
Folded Reed-Solomon codes are an explicit family of codes that achieve the optimal trade-off between rate and error-correction capability: specifically, for any $\eps > 0$, the author and Rudra (2006,08) presented an $n^{O(1/\eps)}$ time…
Kosaraju in ``Computation of squares in a string'' briefly described a linear-time algorithm for computing the minimal squares starting at each position in a word. Using the same construction of suffix trees, we generalize his result and…