String Matching with Variable Length Gaps
Abstract
We consider string matching with variable length gaps. Given a string and a pattern consisting of strings separated by variable length gaps (arbitrary strings of length in a specified range), the problem is to find all ending positions of substrings in that match . This problem is a basic primitive in computational biology applications. Let and be the lengths of and , respectively, and let be the number of strings in . We present a new algorithm achieving time and space , where is the sum of the lower bounds of the lengths of the gaps in and is the total number of occurrences of the strings in within . Compared to the previous results this bound essentially achieves the best known time and space complexities simultaneously. Consequently, our algorithm obtains the best known bounds for almost all combinations of , , , , and . Our algorithm is surprisingly simple and straightforward to implement. We also present algorithms for finding and encoding the positions of all strings in for every match of the pattern.
Cite
@article{arxiv.1110.2893,
title = {String Matching with Variable Length Gaps},
author = {Philip Bille and Inge Li Goertz and Hjalte Wedel Vildhøj and David Kofoed Wind},
journal= {arXiv preprint arXiv:1110.2893},
year = {2011}
}
Comments
draft of full version, extended abstract at SPIRE 2010