Efficient Computation of Sequence Mappability
Abstract
In the -mappability problem, for a given sequence of length , the goal is to compute a table whose th entry is the number of indices such that the length- substrings of starting at positions and have at most mismatches. Previous works on this problem focused on heuristics computing a rough approximation of the result or on the case of . We present several efficient algorithms for the general case of the problem. Our main result is an algorithm that, for , works in space and, with high probability, in time. Our algorithm requires a careful adaptation of the -errata trees of Cole et al. [STOC 2004] to avoid multiple counting of pairs of substrings. Our technique can also be applied to solve the all-pairs Hamming distance problem introduced by Crochemore et al. [WABI 2017]. We further develop -time algorithms to compute all -mappability tables for a fixed and all or a fixed and all . Finally, we show that, for , the -mappability problem cannot be solved in strongly subquadratic time unless the Strong Exponential Time Hypothesis fails. This is an improved and extended version of a paper that was presented at SPIRE 2018.
Cite
@article{arxiv.1807.11702,
title = {Efficient Computation of Sequence Mappability},
author = {Panagiotis Charalampopoulos and Costas S. Iliopoulos and Tomasz Kociumaka and Solon P. Pissis and Jakub Radoszewski and Juliusz Straszyński},
journal= {arXiv preprint arXiv:1807.11702},
year = {2021}
}
Comments
Accepted to SPIRE 2018