English

Bijective BWT based compression schemes

Data Structures and Algorithms 2024-08-20 v2

Abstract

We investigate properties of the bijective Burrows-Wheeler transform (BBWT). We show that for any string ww, a bidirectional macro scheme of size O(rB)O(r_B) can be induced from the BBWT of ww, where rBr_B is the number of maximal character runs in the BBWT. We also show that rB=O(zlog2n)r_B = O(z\log^2 n), where nn is the length of ww and zz is the number of Lempel-Ziv 77 factors of ww. Then, we show a separation between BBWT and BWT by a family of strings with rB=Ω(logn)r_B = \Omega(\log n) but having only r=2r=2 maximal character runs in the standard Burrows--Wheeler transform (BWT). However, we observe that the smallest rBr_B among all cyclic rotations of ww is always at most rr. While an o(n2)o(n^2) algorithm for computing an optimal rotation giving the smallest rBr_B is still open, we show how to compute the Lyndon factorizations -- a component for computing BBWT -- of all cyclic rotations in O(n)O(n) time. Furthermore, we conjecture that we can transform two strings having the same Parikh vector to each other by BBWT and rotation operations, and prove this conjecture for the case of binary alphabets and permutations.

Cite

@article{arxiv.2406.16475,
  title  = {Bijective BWT based compression schemes},
  author = {Golnaz Badkobeh and Hideo Bannai and Dominik Köppl},
  journal= {arXiv preprint arXiv:2406.16475},
  year   = {2024}
}

Comments

Slightly extended version of paper accepted to SPIRE 2024