Related papers: Detecting regularities on grammar-compressed strin…
We present an algorithm for computing the Lyndon factorization of a string that is given in grammar compressed form, namely, a Straight Line Program (SLP). The algorithm runs in $O(n^4 + mn^3h)$ time and $O(n^2)$ space, where $m$ is the…
Here we study the complexity of string problems as a function of the size of a program that generates input. We consider straight-line programs (SLP), since all algorithms on SLP-generated strings could be applied to processing…
It was recently proved that any SLP generating a given string $w$ can be transformed in linear time into an equivalent balanced SLP of the same asymptotic size. We show that this result also holds for RLSLPs, which are SLPs extended with…
In grammar-based compression a string is represented by a context-free grammar, also called a straight-line program (SLP), that generates only that string. We refine a recent balancing result stating that one can transform an SLP of size…
It was recently proved that any Straight-Line Program (SLP) generating a given string can be transformed in linear time into an equivalent balanced SLP of the same asymptotic size. We generalize this proof to a general class of grammars we…
We present an efficient algorithm for calculating $q$-gram frequencies on strings represented in compressed form, namely, as a straight line program (SLP). Given an SLP $\mathcal{T}$ of size $n$ that represents string $T$, the algorithm…
Computation on compressed strings is one of the key approaches to processing massive data sets. We consider local subsequence recognition problems on strings compressed by straight-line programs (SLP), which is closely related to…
The convolution between a text string $S$ of length $N$ and a pattern string $P$ of length $m$ can be computed in $O(N \log m)$ time by FFT. It is known that various types of approximate string matching problems are reducible to…
The longest square subsequence (LSS) problem consists of computing a longest subsequence of a given string $S$ that is a square, i.e., a longest subsequence of form $XX$ appearing in $S$. It is known that an LSS of a string $S$ of length…
We present simple and efficient algorithms for calculating $q$-gram frequencies on strings represented in compressed form, namely, as a straight line program (SLP). Given an SLP of size $n$ that represents string $T$, we present an $O(qn)$…
Length-$q$ substrings, or $q$-grams, can represent important characteristics of text data, and determining the frequencies of all $q$-grams contained in the data is an important problem with many applications in the field of data mining and…
In this paper, a fully compressed pattern matching problem is studied. The compression is represented by straight-line programs (SLPs), i.e. a context-free grammars generating exactly one string; the term fully means that both the pattern…
A Straight-Line Program (SLP) $G$ for a string $T$ is a context-free grammar (CFG) that derives $T$ only, which can be considered as a compressed representation of $T$. In this paper, we show how to encode $G$ in $n \lceil \lg N \rceil + (n…
We explore an extension to straight-line programs (SLPs) that outperforms, for some text families, the measure $\delta$ based on substring complexity, a lower bound for most measures and compressors exploiting repetitiveness (which are…
Palindromes are strings that read the same forward and backward. The computation of palindromic structures within strings is a fundamental problem in string algorithms, being motivated by potential applications in formal language theory and…
We introduce a new class of straight-line programs (SLPs), named the Lyndon SLP, inspired by the Lyndon trees (Barcelo, 1990). Based on this SLP, we propose a self-index data structure of $O(g)$ words of space that can be built from a…
We present an efficient algorithm for computing the LZ78 factorization of a text, where the text is represented as a straight line program (SLP), which is a context free grammar in the Chomsky normal form that generates a single string.…
Described are two algorithms to find long approximate palindromes in a string, for example a DNA sequence. A simple algorithm requires O(n)-space and almost always runs in $O(k.n)$-time where n is the length of the string and k is the…
We solve an open problem related to an optimal encoding of a straight line program (SLP), a canonical form of grammar compression deriving a single string deterministically. We show that an information-theoretic lower bound for representing…
Palindromes are important objects in strings which have been extensively studied from combinatorial, algorithmic, and bioinformatics points of views. It is known that the length of the longest palindromic substrings (LPSs) of a given string…