Efficient Computation of Sequence Mappability

Panagiotis Charalampopoulos; Costas S. Iliopoulos; Tomasz Kociumaka; Solon P. Pissis; Jakub Radoszewski; Juliusz Straszyński

Efficient Computation of Sequence Mappability

Data Structures and Algorithms 2021-06-18 v3

Authors: Panagiotis Charalampopoulos , Costas S. Iliopoulos , Tomasz Kociumaka , Solon P. Pissis , Jakub Radoszewski , Juliusz Straszyński

View on arXiv ↗ PDF ↗

Abstract

In the $(k,m)$ -mappability problem, for a given sequence $T$ of length $n$ , the goal is to compute a table whose $i$ th entry is the number of indices $j \ne i$ such that the length- $m$ substrings of $T$ starting at positions $i$ and $j$ have at most $k$ mismatches. Previous works on this problem focused on heuristics computing a rough approximation of the result or on the case of $k=1$ . We present several efficient algorithms for the general case of the problem. Our main result is an algorithm that, for $k=\mathcal{O}(1)$ , works in $\mathcal{O}(n)$ space and, with high probability, in $\mathcal{O}(n \cdot \min\{m^k,\log^k n\})$ time. Our algorithm requires a careful adaptation of the $k$ -errata trees of Cole et al. [STOC 2004] to avoid multiple counting of pairs of substrings. Our technique can also be applied to solve the all-pairs Hamming distance problem introduced by Crochemore et al. [WABI 2017]. We further develop $\mathcal{O}(n^2)$ -time algorithms to compute all $(k,m)$ -mappability tables for a fixed $m$ and all $k\in \{0,\ldots,m\}$ or a fixed $k$ and all $m\in\{k,\ldots,n\}$ . Finally, we show that, for $k,m = \Theta(\log n)$ , the $(k,m)$ -mappability problem cannot be solved in strongly subquadratic time unless the Strong Exponential Time Hypothesis fails. This is an improved and extended version of a paper that was presented at SPIRE 2018.

Keywords

graph algorithm succinct data structure approximation algorithm

Cite

@article{arxiv.1807.11702,
  title  = {Efficient Computation of Sequence Mappability},
  author = {Panagiotis Charalampopoulos and Costas S. Iliopoulos and Tomasz Kociumaka and Solon P. Pissis and Jakub Radoszewski and Juliusz Straszyński},
  journal= {arXiv preprint arXiv:1807.11702},
  year   = {2021}
}

Comments

Accepted to SPIRE 2018

Efficient Computation of Sequence Mappability

Abstract

Keywords

Cite

Comments

Related papers