English

Faster Approximate String Matching for Short Patterns

Data Structures and Algorithms 2011-03-21 v2

Abstract

We study the classical approximate string matching problem, that is, given strings PP and QQ and an error threshold kk, find all ending positions of substrings of QQ whose edit distance to PP is at most kk. Let PP and QQ have lengths mm and nn, respectively. On a standard unit-cost word RAM with word size wlognw \geq \log n we present an algorithm using time O(nkmin(log2mlogn,log2mlogww)+n) O(nk \cdot \min(\frac{\log^2 m}{\log n},\frac{\log^2 m\log w}{w}) + n) When PP is short, namely, m=2o(logn)m = 2^{o(\sqrt{\log n})} or m=2o(w/logw)m = 2^{o(\sqrt{w/\log w})} this improves the previously best known time bounds for the problem. The result is achieved using a novel implementation of the Landau-Vishkin algorithm based on tabulation and word-level parallelism.

Keywords

Cite

@article{arxiv.0811.3490,
  title  = {Faster Approximate String Matching for Short Patterns},
  author = {Philip Bille},
  journal= {arXiv preprint arXiv:0811.3490},
  year   = {2011}
}

Comments

To appear in Theory of Computing Systems

R2 v1 2026-06-21T11:43:57.249Z