English

A Generic Framework for Efficient and Effective Subsequence Retrieval

Databases 2012-08-02 v1

Abstract

This paper proposes a general framework for matching similar subsequences in both time series and string databases. The matching results are pairs of query subsequences and database subsequences. The framework finds all possible pairs of similar subsequences if the distance measure satisfies the "consistency" property, which is a property introduced in this paper. We show that most popular distance functions, such as the Euclidean distance, DTW, ERP, the Frechet distance for time series, and the Hamming distance and Levenshtein distance for strings, are all "consistent". We also propose a generic index structure for metric spaces named "reference net". The reference net occupies O(n) space, where n is the size of the dataset and is optimized to work well with our framework. The experiments demonstrate the ability of our method to improve retrieval performance when combined with diverse distance measures. The experiments also illustrate that the reference net scales well in terms of space overhead and query time.

Keywords

Cite

@article{arxiv.1208.0286,
  title  = {A Generic Framework for Efficient and Effective Subsequence Retrieval},
  author = {Haohan Zhu and George Kollios and Vassilis Athitsos},
  journal= {arXiv preprint arXiv:1208.0286},
  year   = {2012}
}

Comments

VLDB2012

R2 v1 2026-06-21T21:44:51.791Z