Related papers: Distances between Data Sets Based on Summary Stati…

Distances for Comparing Multisets and Sequences

Measuring the distance between data points is fundamental to many statistical techniques, such as dimension reduction or clustering algorithms. However, improvements in data collection technologies has led to a growing versatility of…

Methodology · Statistics 2022-06-20 George Bolt , Simón Lunagómez , Christopher Nemeth

Geometric Dataset Distances via Optimal Transport

The notion of task similarity is at the core of various machine learning paradigms, such as domain adaptation and meta-learning. Current methods to quantify it are often heuristic, make strong assumptions on the label sets across the tasks,…

Machine Learning · Computer Science 2020-02-10 David Alvarez-Melis , Nicolò Fusi

The Extended Edit Distance Metric

Similarity search is an important problem in information retrieval. This similarity is based on a distance. Symbolic representation of time series has attracted many researchers recently, since it reduces the dimensionality of these high…

Information Retrieval · Computer Science 2010-06-18 Muhammad Marwan Muhammad Fuad , Pierre-François Marteau

Distance Between Sets - A survey

The purpose of this paper is to give a survey on the notions of distance between subsets either of a metric space or of a measure space, including definitions, a classification, and a discussion of the best-known distance functions, which…

Functional Analysis · Mathematics 2018-08-09 A. Conci , C. S. Kubrusly

Distance approximation using Isolation Forests

This work briefly explores the possibility of approximating spatial distance (alternatively, similarity) between data points using the Isolation Forest method envisioned for outlier detection. The logic is similar to that of isolation: the…

Machine Learning · Statistics 2019-11-26 David Cortes

A Survey on Metric Learning for Feature Vectors and Structured Data

The need for appropriate ways to measure the distance or similarity between data is ubiquitous in machine learning, pattern recognition and data mining, but handcrafting such good metrics for specific problems is generally difficult. This…

Machine Learning · Computer Science 2019-01-25 Aurélien Bellet , Amaury Habrard , Marc Sebban

Magnitude Distance: A Geometric Measure of Dataset Similarity

Quantifying the distance between datasets is a fundamental question in mathematics and machine learning. We propose \textit{magnitude distance}, a novel distance metric defined on finite datasets using the notion of the \emph{magnitude} of…

Machine Learning · Computer Science 2026-02-10 Sahel Torkamani , Henry Gouk , Rik Sarkar

Mahalanobis Distance Metric Learning Algorithm for Instance-based Data Stream Classification

With the massive data challenges nowadays and the rapid growing of technology, stream mining has recently received considerable attention. To address the large number of scenarios in which this phenomenon manifests itself suitable tools are…

Machine Learning · Computer Science 2016-04-19 Jorge Luis Rivero Perez , Bernardete Ribeiro , Carlos Morell Perez

Same But Different: Distance Correlations Between Topological Summaries

Persistent homology allows us to create topological summaries of complex data. In order to analyse these statistically, we need to choose a topological summary and a relevant metric space in which this topological summary exists. While…

Algebraic Topology · Mathematics 2019-06-24 Katharine Turner , Gard Spreemann

Mahalanonbis Distance Informed by Clustering

A fundamental question in data analysis, machine learning and signal processing is how to compare between data points. The choice of the distance metric is specifically challenging for high-dimensional data sets, where the problem of…

Machine Learning · Statistics 2017-08-15 Almog Lahav , Ronen Talmon , Yuval Kluger

An Empirical Comparison of Methods for Quantifying the Similarity of Categorical Datasets

Quantifying the similarity of two or more datasets has widespread applications in statistics and machine learning. The method choice is, however, difficult due to the abundance of proposed methods and the lack of neutral comparison studies,…

Methodology · Statistics 2026-04-14 Marieke Stolte , Jörg Rahnenführer , Andrea Bommert

Measuring Congruence on High Dimensional Time Series

A time series is a sequence of data items; typical examples are videos, stock ticker data, or streams of temperature measurements. Quite some research has been devoted to comparing and indexing simple time series, i.e., time series where…

Computational Complexity · Computer Science 2018-06-04 Jörg P. Bachmann , Johann-Christoph Freytag , Benjamin Hauskeller , Nicole Schweikardt

Ranking the information content of distance measures

Real-world data typically contain a large number of features that are often heterogeneous in nature, relevance, and also units of measure. When assessing the similarity between data points, one can build various distance measures using…

Machine Learning · Statistics 2022-05-27 Aldo Glielmo , Claudio Zeni , Bingqing Cheng , Gabor Csanyi , Alessandro Laio

On Mahalanobis distance in functional settings

Mahalanobis distance is a classical tool in multivariate analysis. We suggest here an extension of this concept to the case of functional data. More precisely, the proposed definition concerns those statistical problems where the sample…

Methodology · Statistics 2018-03-20 José R. Berrendero , Beatriz Bueno-Larraz , Antonio Cuevas

Reconciling Similar Sets of Data

In this work, we consider the problem of synchronizing two sets of data where the size of the symmetric difference between the sets is small and, in addition, the elements in the symmetric difference are related through the Hamming distance…

Information Theory · Computer Science 2018-09-14 Ryan Gabrys , Farzad Farnoud

A Practical Guide to Sample-based Statistical Distances for Evaluating Generative Models in Science

Generative models are invaluable in many fields of science because of their ability to capture high-dimensional and complicated distributions, such as photo-realistic images, protein structures, and connectomes. How do we evaluate the…

Machine Learning · Computer Science 2024-10-11 Sebastian Bischoff , Alana Darcher , Michael Deistler , Richard Gao , Franziska Gerken , Manuel Gloeckler , Lisa Haxel , Jaivardhan Kapoor , Janne K Lappalainen , Jakob H Macke , Guy Moss , Matthijs Pals , Felix Pei , Rachel Rapp , A Erdem Sağtekin , Cornelius Schröder , Auguste Schulz , Zinovia Stefanidi , Shoji Toyota , Linda Ulmer , Julius Vetter

An Investigation into Distance Measures in Cluster Analysis

This report provides an exploration of different distance measures that can be used with the $K$-means algorithm for cluster analysis. Specifically, we investigate the Mahalanobis distance, and critically assess any benefits it may have…

Other Statistics · Statistics 2024-04-23 Zoe Shapcott

Metrics Based on Average Distance Between Sets

This paper presents a distance function between sets based on an average of distances between their elements. The distance function is a metric if the sets are non-empty finite subsets of a metric space. It can be applied to produce various…

Metric Geometry · Mathematics 2011-09-13 Osamu Fujita

On the distribution of cross-validated Mahalanobis distances

We present analytical expressions for the means and covariances of the sample distribution of the cross-validated Mahalanobis distance. This measure has proven to be especially useful in the context of representational similarity analysis…

Applications · Statistics 2016-07-06 Jörn Diedrichsen , Serge Provost , Hossein Zareamoghaddam

Matrix Profile XXVII: A Novel Distance Measure for Comparing Long Time Series

The most useful data mining primitives are distance measures. With an effective distance measure, it is possible to perform classification, clustering, anomaly detection, segmentation, etc. For single-event time series Euclidean Distance…

Machine Learning · Computer Science 2022-12-14 Audrey Der , Chin-Chia Michael Yeh , Renjie Wu , Junpeng Wang , Yan Zheng , Zhongfang Zhuang , Liang Wang , Wei Zhang , Eamonn Keogh