Related papers: Universal Similarity

We survey the emerging area of compression-based, parameter-free, similarity distance measures useful in data-mining, pattern recognition, learning and automatic semantics extraction. Given a family of distances on a set of objects, a…

Computer Vision and Pattern Recognition · Computer Science 2007-05-23 Rudi Cilibrasi , Paul Vitanyi

Compression-based Similarity

First we consider pair-wise distances for literal objects consisting of finite binary files. These files are taken to contain all of their meaning, like genomes or books. The distances are based on compression of the objects concerned,…

Information Theory · Computer Science 2011-10-21 Paul M. B. Vitanyi

Measuring Global Similarity between Texts

We propose a new similarity measure between texts which, contrary to the current state-of-the-art approaches, takes a global view of the texts to be compared. We have implemented a tool to compute our textual distance and conducted…

Computation and Language · Computer Science 2014-05-15 Uli Fahrenberg , Fabrizio Biondi , Kevin Corre , Cyrille Jegourel , Simon Kongshøj , Axel Legay

The Extended Edit Distance Metric

Similarity search is an important problem in information retrieval. This similarity is based on a distance. Symbolic representation of time series has attracted many researchers recently, since it reduces the dimensionality of these high…

Information Retrieval · Computer Science 2010-06-18 Muhammad Marwan Muhammad Fuad , Pierre-François Marteau

The similarity metric

A new class of distances appropriate for measuring similarity relations between sequences, say one type of similarity per distance, is studied. We propose a new ``normalized information distance'', based on the noncomputable notion of…

Computational Complexity · Computer Science 2011-11-09 Ming Li , Xin Chen , Xin Li , Bin Ma , Paul Vitanyi

Simple Distances for Trajectories via Landmarks

We develop a new class of distances for objects including lines, hyperplanes, and trajectories, based on the distance to a set of landmarks. These distances easily and interpretably map objects to a Euclidean space, are simple to compute,…

Computational Geometry · Computer Science 2019-06-13 Jeff M. Phillips , Pingfan Tang

The Google Similarity Distance

Words and phrases acquire meaning from the way they are used in society, from their relative semantics to other words and phrases. For computers the equivalent of `society' is `database,' and the equivalent of `use' is `way to search the…

Computation and Language · Computer Science 2007-06-13 Rudi Cilibrasi , Paul M. B. Vitanyi

Understanding (dis)similarity measures

Intuitively, the concept of similarity is the notion to measure an inexact matching between two entities of the same reference set. The notions of similarity and its close relative dissimilarity are widely used in many fields of Artificial…

Artificial Intelligence · Computer Science 2012-12-13 Lluís A. Belanche

The Generalized Universal Law of Generalization

It has been argued by Shepard that there is a robust psychological law that relates the distance between a pair of items in psychological space and the probability that they will be confused with each other. Specifically, the probability of…

Computer Vision and Pattern Recognition · Computer Science 2007-05-23 Nick Chater , Paul Vitanyi

Normalized Information Distance

The normalized information distance is a universal distance measure for objects of all kinds. It is based on Kolmogorov complexity and thus uncomputable, but there are ways to utilize it. First, compression algorithms can be used to…

Information Retrieval · Computer Science 2008-09-16 Paul M. B. Vitanyi , Frank J. Balbach , Rudi L. Cilibrasi , Ming Li

Free congruence: an exploration of expanded similarity measures for time series data

Time series similarity measures are highly relevant in a wide range of emerging applications including training machine learning models, classification, and predictive modeling. Standard similarity measures for time series most often…

Machine Learning · Computer Science 2021-01-22 Lucas Cassiel Jacaruso

Normalized web distance (NWD) is a similarity or normalized semantic distance based on the World Wide Web or another large electronic database, for instance Wikipedia, and a search engine that returns reliable aggregate page counts. For…

Information Retrieval · Computer Science 2020-07-24 Andrew R. Cohen , Paul M. B. Vitanyi

Measuring Human-perceived Similarity in Heterogeneous Collections

We present a technique for estimating the similarity between objects such as movies or foods whose proper representation depends on human perception. Our technique combines a modest number of human similarity assessments to infer a pairwise…

Artificial Intelligence · Computer Science 2018-02-19 Jesse Anderton , Pavel Metrikov , Virgil Pavlu , Javed Aslam

Measuring Congruence on High Dimensional Time Series

A time series is a sequence of data items; typical examples are videos, stock ticker data, or streams of temperature measurements. Quite some research has been devoted to comparing and indexing simple time series, i.e., time series where…

Computational Complexity · Computer Science 2018-06-04 Jörg P. Bachmann , Johann-Christoph Freytag , Benjamin Hauskeller , Nicole Schweikardt

A Guide to Similarity Measures

Similarity measures play a central role in various data science application domains for a wide assortment of tasks. This guide describes a comprehensive set of prevalent similarity measures to serve both non-experts and professional.…

Information Retrieval · Computer Science 2024-08-16 Avivit Levy , B. Riva Shalom , Michal Chalamish

Generalization of distance to higher dimensional objects

The measurement of distance between two objects is generalized to the case where the objects are no longer points but are one-dimensional. Additional concepts such as non-extensibility, curvature constraints, and non-crossing become central…

Soft Condensed Matter · Physics 2008-03-04 Steven S. Plotkin

Normalized Web Distance and Word Similarity

There is a great deal of work in cognitive psychology, linguistics, and computer science, about using word (or phrase) frequencies in context in text corpora to develop measures for word similarity or word association, going back to at…

Computation and Language · Computer Science 2009-05-26 Rudi L. Cilibrasi , Paul M. B. Vitanyi

A Survey on Efficient Processing of Similarity Queries over Neural Embeddings

Similarity query is the family of queries based on some similarity metrics. Unlike the traditional database queries which are mostly based on value equality, similarity queries aim to find targets "similar enough to" the given data objects,…

Databases · Computer Science 2022-04-19 Yifan Wang

Generalization-baed similarity

Detecting and exploiting similarities between seemingly distant objects is without doubt an important human ability. This paper develops \textit{from the ground up} an abstract algebraic and qualitative notion of similarity based on the…

Artificial Intelligence · Computer Science 2025-05-20 Christian Antić

Generalized Compression Dictionary Distance as Universal Similarity Measure

We present a new similarity measure based on information theoretic measures which is superior than Normalized Compression Distance for clustering problems and inherits the useful properties of conditional Kolmogorov complexity. We show that…

Machine Learning · Statistics 2014-10-22 Andrey Bogomolov , Bruno Lepri , Fabio Pianesi