Related papers: A Generic Framework for Efficient and Effective Su…
Similar subtrajectory search is a finer-grained operator that can better capture the similarities between one query trajectory and a portion of a data trajectory than the traditional similar trajectory search, which requires the two checked…
Subsequence matching has appeared to be an ideal approach for solving many problems related to the fields of data mining and similarity retrieval. It has been shown that almost any data class (audio, image, biometrics, signals) is or can be…
Similarity search is the problem of finding in a collection of objects those that are similar to a given query object. It is a fundamental problem in modern applications and the objects considered may be as diverse as locations in space,…
Approximate pattern matching is a natural and well-studied problem on strings: Given a text $T$, a pattern $P$, and a threshold $k$, find (the starting positions of) all substrings of $T$ that are at distance at most $k$ from $P$. We…
Finding the most similar subsequences between two multidimensional time series has many applications: e.g. capturing dependency in stock market or discovering coordinated movement of baboons. Considering one pattern occurring in one time…
Traditional pairwise sequence alignment is based on matching individual samples from two sequences, under time monotonicity constraints. However, in many application settings matching subsequences (segments) instead of individual samples…
Time series similarity measures are highly relevant in a wide range of emerging applications including training machine learning models, classification, and predictive modeling. Standard similarity measures for time series most often…
Similarity searching finds application in a wide variety of domains including multilingual databases, computational biology, pattern recognition and text retrieval. Similarity is measured in terms of a distance function, edit distance, in…
Developing a universal model that can efficiently and effectively respond to a wide range of information access requests -- from retrieval to recommendation to question answering -- has been a long-lasting goal in the information retrieval…
This paper introduces the sequence covering similarity, that we formally define for evaluating the similarity between a symbolic sequence (string) and a set of symbolic sequences (strings). From this covering similarity we derive a…
Edit distance similarity search, also called approximate pattern matching, is a fundamental problem with widespread database applications. The goal of the problem is to preprocess $n$ strings of length $d$, to quickly answer queries $q$ of…
Time series data appears in a variety of applications such as smart transportation and environmental monitoring. One of the fundamental problems for time series analysis is time series forecasting. Despite the success of recent deep time…
A common approach to implementing similarity search applications is the usage of distance functions, where small distances indicate high similarity. In the case of metric distance functions, metric index structures can be used to accelerate…
Similarity search is an important function in many applications, which usually focuses on measuring the similarity between objects with the same type. However, in many scenarios, we need to measure the relatedness between objects with…
Data series are a special type of multidimensional data present in numerous domains, where similarity search is a key operation that has been extensively studied in the data series literature. In parallel, the multidimensional community has…
The paper considers various formalisms based on Automata, Temporal Logic and Regular Expressions for specifying queries over sequences. Unlike traditional binary semantics, the paper presents a similarity based semantics for thse…
Similarity search is an important problem in information retrieval. This similarity is based on a distance. Symbolic representation of time series has attracted many researchers recently, since it reduces the dimensionality of these high…
The rise of internet has resulted in an explosion of data consisting of millions of articles, images, songs, and videos. Most of this data is high dimensional and sparse. The need to perform an efficient search for similar objects in such…
We present a new efficient method for approximate search in electronic lexica. Given an input string (the pattern) and a similarity threshold, the algorithm retrieves all entries of the lexicon that are sufficiently similar to the pattern.…
We propose an approximation algorithm for efficient correlation search in time series data. In our method, we use Fourier transform and neural network to embed time series into a low-dimensional Euclidean space. The given space is learned…