English

Simple Linear-time Repetition Factorization

Data Structures and Algorithms 2024-08-09 v1

Abstract

A factorization f1,,fmf_1, \ldots, f_m of a string ww of length nn is called a repetition factorization of ww if fif_i is a repetition, i.e., fif_i is a form of xkxx^kx', where xx is a non-empty string, xx' is a (possibly-empty) proper prefix of xx, and k2k \geq 2. Dumitran et al. [SPIRE 2015] presented an O(n)O(n)-time and space algorithm for computing an arbitrary repetition factorization of a given string of length nn. Their algorithm heavily relies on the Union-Find data structure on trees proposed by Gabow and Tarjan [JCSS 1985] that works in linear time on the word RAM model, and an interval stabbing data structure of Schmidt [ISAAC 2009]. In this paper, we explore more combinatorial insights into the problem, and present a simple algorithm to compute an arbitrary repetition factorization of a given string of length nn in O(n)O(n) time, without relying on data structures for Union-Find and interval stabbing. Our algorithm follows the approach by Inoue et al. [ToCS 2022] that computes the smallest/largest repetition factorization in O(nlogn)O(n \log n) time.

Keywords

Cite

@article{arxiv.2408.04253,
  title  = {Simple Linear-time Repetition Factorization},
  author = {Yuki Yonemoto and Shunsuke Inenaga},
  journal= {arXiv preprint arXiv:2408.04253},
  year   = {2024}
}

Comments

Accepted for SPIRE 2024