Related papers: The Extended Edit Distance Metric

Experimental Comparison of Representation Methods and Distance Measures for Time Series Data

The previous decade has brought a remarkable increase of the interest in applications that deal with querying and mining of time series data. Many of the research efforts in this context have focused on introducing new representation…

Artificial Intelligence · Computer Science 2015-03-17 Xiaoyue Wang , Hui Ding , Goce Trajcevski , Peter Scheuermann , Eamonn Keogh

Measuring Congruence on High Dimensional Time Series

A time series is a sequence of data items; typical examples are videos, stock ticker data, or streams of temperature measurements. Quite some research has been devoted to comparing and indexing simple time series, i.e., time series where…

Computational Complexity · Computer Science 2018-06-04 Jörg P. Bachmann , Johann-Christoph Freytag , Benjamin Hauskeller , Nicole Schweikardt

Free congruence: an exploration of expanded similarity measures for time series data

Time series similarity measures are highly relevant in a wide range of emerging applications including training machine learning models, classification, and predictive modeling. Standard similarity measures for time series most often…

Machine Learning · Computer Science 2021-01-22 Lucas Cassiel Jacaruso

Metrics for Inter-Dataset Similarity with Example Applications in Synthetic Data and Feature Selection Evaluation -- Extended Version

Measuring inter-dataset similarity is an important task in machine learning and data mining with various use cases and applications. Existing methods for measuring inter-dataset similarity are computationally expensive, limited, or…

Machine Learning · Computer Science 2025-05-06 Muhammad Rajabinasab , Anton D. Lautrup , Arthur Zimek

Distances between Data Sets Based on Summary Statistics

The concepts of similarity and distance are crucial in data mining. We consider the problem of defining the distance between two data sets by comparing summary statistics computed from the data sets. The initial definition of our distance…

Data Structures and Algorithms · Computer Science 2019-02-05 Nikolaj Tatti

Universal Similarity

We survey a new area of parameter-free similarity distance measures useful in data-mining, pattern recognition, learning and automatic semantics extraction. Given a family of distances on a set of objects, a distance is universal up to a…

Information Retrieval · Computer Science 2007-05-23 Paul Vitanyi

We survey the emerging area of compression-based, parameter-free, similarity distance measures useful in data-mining, pattern recognition, learning and automatic semantics extraction. Given a family of distances on a set of objects, a…

Computer Vision and Pattern Recognition · Computer Science 2007-05-23 Rudi Cilibrasi , Paul Vitanyi

Distances for Comparing Multisets and Sequences

Measuring the distance between data points is fundamental to many statistical techniques, such as dimension reduction or clustering algorithms. However, improvements in data collection technologies has led to a growing versatility of…

Methodology · Statistics 2022-06-20 George Bolt , Simón Lunagómez , Christopher Nemeth

Matrix Profile XXVII: A Novel Distance Measure for Comparing Long Time Series

The most useful data mining primitives are distance measures. With an effective distance measure, it is possible to perform classification, clustering, anomaly detection, segmentation, etc. For single-event time series Euclidean Distance…

Machine Learning · Computer Science 2022-12-14 Audrey Der , Chin-Chia Michael Yeh , Renjie Wu , Junpeng Wang , Yan Zheng , Zhongfang Zhuang , Liang Wang , Wei Zhang , Eamonn Keogh

Fairest Neighbors: Tradeoffs Between Metric Queries

Metric search commonly involves finding objects similar to a given sample object. We explore a generalization, where the desired result is a fair tradeoff between multiple query objects. This builds on previous results on complex queries,…

Data Structures and Algorithms · Computer Science 2021-08-10 Magnus Lie Hetland , Halvard Hummel

Metric Learning on Manifolds

Recent literature has shown that symbolic data, such as text and graphs, is often better represented by points on a curved manifold, rather than in Euclidean space. However, geometrical operations on manifolds are generally more complicated…

Machine Learning · Computer Science 2019-02-06 Max Aalto , Nakul Verma

Indexing Metric Spaces for Exact Similarity Search

With the continued digitization of societal processes, we are seeing an explosion in available data. This is referred to as big data. In a research setting, three aspects of the data are often viewed as the main sources of challenges when…

Databases · Computer Science 2022-05-24 Lu Chen , Yunjun Gao , Xuan Song , Zheng Li , Yifan Zhu , Xiaoye Miao , Christian S. Jensen

Distance approximation using Isolation Forests

This work briefly explores the possibility of approximating spatial distance (alternatively, similarity) between data points using the Isolation Forest method envisioned for outlier detection. The logic is similar to that of isolation: the…

Machine Learning · Statistics 2019-11-26 David Cortes

Semantically-informed distance and similarity measures for paraphrase plagiarism identification

Paraphrase plagiarism identification represents a very complex task given that plagiarized texts are intentionally modified through several rewording techniques. Accordingly, this paper introduces two new measures for evaluating the…

Computation and Language · Computer Science 2018-05-31 Miguel A. Álvarez-Carmona , Marc Franco-Salvador , Esaú Villatoro-Tello , Manuel Montes-y-Gómez , Paolo Rosso , Luis Villaseñor-Pineda

Distance Measures for Geometric Graphs

A geometric graph is a combinatorial graph, endowed with a geometry that is inherited from its embedding in a Euclidean space. Formulation of a meaningful measure of (dis-)similarity in both the combinatorial and geometric structures of two…

Computational Geometry · Computer Science 2022-09-27 Sushovan Majhi , Carola Wenk

Permutation Jensen-Shannon distance: A versatile and fast symbolic tool for complex time series analysis

The main motivation of this paper is to introduce the permutation Jensen-Shannon distance, a symbolic tool able to quantify the degree of similarity between two arbitrary time series. This quantifier results from the fusion of two concepts,…

Data Analysis, Statistics and Probability · Physics 2022-04-20 Luciano Zunino , Felipe Olivares , Haroldo V. Ribeiro , Osvaldo A. Rosso

Metric Statistics: Exploration and Inference for Random Objects With Distance Profiles

This article provides an overview on the statistical modeling of complex data as increasingly encountered in modern data analysis. It is argued that such data can often be described as elements of a metric space that satisfies certain…

Methodology · Statistics 2024-02-28 Paromita Dubey , Yaqing Chen , Hans-Georg Müller

The Historical Significance of Textual Distances

Measuring similarity is a basic task in information retrieval, and now often a building-block for more complex arguments about cultural change. But do measures of textual similarity and distance really correspond to evidence about cultural…

Computation and Language · Computer Science 2018-07-03 Ted Underwood

Towards a faster symbolic aggregate approximation method

The similarity search problem is one of the main problems in time series data mining. Traditionally, this problem was tackled by sequentially comparing the given query against all the time series in the database, and returning all the time…

Databases · Computer Science 2013-01-25 Muhammad Marwan Muhammad Fuad , Pierre-François Marteau

DIMS: Distributed Index for Similarity Search in Metric Spaces

Similarity search finds objects that are similar to a given query object based on a similarity metric. As the amount and variety of data continue to grow, similarity search in metric spaces has gained significant attention. Metric spaces…

Databases · Computer Science 2024-10-08 Yifan Zhu , Chengyang Luo , Tang Qian , Lu Chen , Yunjun Gao , Baihua Zheng