Faster Approximate String Matching for Short Patterns

Philip Bille

Faster Approximate String Matching for Short Patterns

Data Structures and Algorithms 2011-03-21 v2

Authors: Philip Bille

Abstract

We study the classical approximate string matching problem, that is, given strings $P$ and $Q$ and an error threshold $k$ , find all ending positions of substrings of $Q$ whose edit distance to $P$ is at most $k$ . Let $P$ and $Q$ have lengths $m$ and $n$ , respectively. On a standard unit-cost word RAM with word size $w \geq \log n$ we present an algorithm using time $O(nk \cdot \min(\frac{\log^2 m}{\log n},\frac{\log^2 m\log w}{w}) + n)$ When $P$ is short, namely, $m = 2^{o(\sqrt{\log n})}$ or $m = 2^{o(\sqrt{w/\log w})}$ this improves the previously best known time bounds for the problem. The result is achieved using a novel implementation of the Landau-Vishkin algorithm based on tabulation and word-level parallelism.

Keywords

string algorithms succinct data structure approximation algorithm

Cite

@article{arxiv.0811.3490,
  title  = {Faster Approximate String Matching for Short Patterns},
  author = {Philip Bille},
  journal= {arXiv preprint arXiv:0811.3490},
  year   = {2011}
}

Comments

To appear in Theory of Computing Systems

Faster Approximate String Matching for Short Patterns

Abstract

Keywords

Cite

Comments

Related papers