Related papers: Simple Linear-time Repetition Factorization

Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Computing the LZ factorization (or LZ77 parsing) of a string is a computational bottleneck in many diverse applications, including data compression, text indexing, and pattern discovery. We describe new linear time LZ factorization…

Data Structures and Algorithms · Computer Science 2020-12-11 Juha Kärkkäinen , Dominik Kempa , Simon J. Puglisi

Lempel-Ziv (LZ77) Factorization in Sublinear Time

Lempel-Ziv (LZ77) factorization is a fundamental problem in string processing: Greedily partition a given string $T$ from left to right into blocks (called phrases) so that each phrase is either the leftmost occurrence of a letter or the…

Data Structures and Algorithms · Computer Science 2025-06-19 Dominik Kempa , Tomasz Kociumaka

A faster algorithm for the construction of optimal factoring automata

The problem of constructing optimal factoring automata arises in the context of unification factoring for the efficient execution of logic programs. Given an ordered set of $n$ strings of length $m$, the problem is to construct a trie-like…

Data Structures and Algorithms · Computer Science 2024-04-04 Thomas Erlebach , Kleitos Papadopoulos

Palindromic k-Factorization in Pure Linear Time

Given a string $s$ of length $n$ over a general alphabet and an integer $k$, the problem is to decide whether $s$ is a concatenation of $k$ nonempty palindromes. Two previously known solutions for this problem work in time $O(kn)$ and…

Data Structures and Algorithms · Computer Science 2020-07-07 Mikhail Rubinchik , Arseny M. Shur

Faster Compact On-Line Lempel-Ziv Factorization

We present a new on-line algorithm for computing the Lempel-Ziv factorization of a string that runs in $O(N\log N)$ time and uses only $O(N\log\sigma)$ bits of working space, where $N$ is the length of the string and $\sigma$ is the size of…

Data Structures and Algorithms · Computer Science 2013-05-28 Jun'ichi Yamamoto , Tomohiro I , Hideo Bannai , Shunsuke Inenaga , Masayuki Takeda

Time and Space Efficient Lempel-Ziv Factorization based on Run Length Encoding

We propose a new approach for calculating the Lempel-Ziv factorization of a string, based on run length encoding (RLE). We present a conceptually simple off-line algorithm based on a variant of suffix arrays, as well as an on-line algorithm…

Data Structures and Algorithms · Computer Science 2015-03-20 Jun'ichi Yamamoto , Hideo Bannai , Shunsuke Inenaga , Masayuki Takeda

Space Efficient Linear Time Lempel-Ziv Factorization on Constant~Size~Alphabets

We present a new algorithm for computing the Lempel-Ziv Factorization (LZ77) of a given string of length $N$ in linear time, that utilizes only $N\log N + O(1)$ bits of working space, i.e., a single integer array, for constant size integer…

Data Structures and Algorithms · Computer Science 2013-10-08 Keisuke Goto , Hideo Bannai

Efficient Index for Weighted Sequences

The problem of finding factors of a text string which are identical or similar to a given pattern string is a central problem in computer science. A generalised version of this problem consists in implementing an index over the text to…

Data Structures and Algorithms · Computer Science 2016-02-04 Carl Barton , Tomasz Kociumaka , Solon P. Pissis , Jakub Radoszewski

Construction of Sparse Suffix Trees and LCE Indexes in Optimal Time and Space

The notions of synchronizing and partitioning sets are recently introduced variants of locally consistent parsings with great potential in problem-solving. In this paper we propose a deterministic algorithm that constructs for a given…

Data Structures and Algorithms · Computer Science 2024-04-23 Dmitry Kosolobov , Nikita Sivukhin

Optimal Construction of Compressed Indexes for Highly Repetitive Texts

We propose algorithms that, given the input string of length $n$ over integer alphabet of size $\sigma$, construct the Burrows-Wheeler transform (BWT), the permuted longest-common-prefix (PLCP) array, and the LZ77 parsing in…

Data Structures and Algorithms · Computer Science 2020-12-09 Dominik Kempa

On Longest Repeat Queries

Repeat finding in strings has important applications in subfields such as computational biology. Surprisingly, all prior work on repeat finding did not consider the constraint on the locality of repeats. In this paper, we propose and study…

Data Structures and Algorithms · Computer Science 2015-01-27 Atalay Mert İleri , M. Oğuzhan Külekci , Bojian Xu

String factorisations with maximum or minimum dimension

In this paper we consider two problems concerning string factorisation. Specifically given a string $w$ and an integer $k$ find a factorisation of $w$ where each factor has length bounded by $k$ and has the minimum (the FmD problem) or the…

Data Structures and Algorithms · Computer Science 2019-12-24 Angelo Monti , Blerina Sinaimeri

Cartesian Tree Matching and Indexing

We introduce a new metric of match, called Cartesian tree matching, which means that two strings match if they have the same Cartesian trees. Based on Cartesian tree matching, we define single pattern matching for a text of length n and a…

Data Structures and Algorithms · Computer Science 2019-05-23 Sung Gwan Park , Amihood Amir , Gad M. Landau , Kunsoo Park

Efficient Lyndon factorization of grammar compressed text

We present an algorithm for computing the Lyndon factorization of a string that is given in grammar compressed form, namely, a Straight Line Program (SLP). The algorithm runs in $O(n^4 + mn^3h)$ time and $O(n^2)$ space, where $m$ is the…

Data Structures and Algorithms · Computer Science 2013-04-29 Tomohiro I , Yuto Nakashima , Shunsuke Inenaga , Hideo Bannai , Masayuki Takeda

Longest Unbordered Factors on Run-Length Encoded Strings

A border of a string is a non-empty proper prefix of the string that is also a suffix. A string is unbordered if it has no border. The longest unbordered factor is a fundamental notion in stringology, closely related to string periodicity.…

Data Structures and Algorithms · Computer Science 2025-07-23 Shoma Sekizaki , Takuya Mieno

On Stabbing Queries for Generalized Longest Repeat

A longest repeat query on a string, motivated by its applications in many subfields including computational biology, asks for the longest repetitive substring(s) covering a particular string position (point query). In this paper, we extend…

Data Structures and Algorithms · Computer Science 2015-11-10 Bojian Xu

Linear pattern matching on sparse suffix trees

Packing several characters into one computer word is a simple and natural way to compress the representation of a string and to speed up its processing. Exploiting this idea, we propose an index for a packed string, based on a {\em sparse…

Data Structures and Algorithms · Computer Science 2015-03-19 Roman Kolpakov , Gregory Kucherov , Tatiana Starikovskaya

Linear Index for Logarithmic Search-Time for any String under any Internal Node in Suffix Trees

Suffix trees are key and efficient data structure for solving string problems. A suffix tree is a compressed trie containing all the suffixes of a given text of length $n$ with a linear construction cost. In this work, we introduce an…

Data Structures and Algorithms · Computer Science 2024-06-04 Anas Al-okaily

Indexing Weighted Sequences: Neat and Efficient

In a \emph{weighted sequence}, for every position of the sequence and every letter of the alphabet a probability of occurrence of this letter at this position is specified. Weighted sequences are commonly used to represent imprecise or…

Data Structures and Algorithms · Computer Science 2017-08-28 Carl Barton , Tomasz Kociumaka , Chang Liu , Solon P. Pissis , Jakub Radoszewski

The Complexity of Dynamic LZ77 is $\tilde{\Theta}(n^{2/3})$

The Lempel-Ziv 77 (LZ77) factorization is a fundamental compression scheme widely used in text processing and data compression. In this work, we investigate the time complexity of maintaining the LZ77 factorization of a dynamic string. By…

Data Structures and Algorithms · Computer Science 2025-10-28 Itai Boneh , Shay Golan , Matan Kraus