Related papers: Normalized Web Distance and Word Similarity

Normalized web distance (NWD) is a similarity or normalized semantic distance based on the World Wide Web or another large electronic database, for instance Wikipedia, and a search engine that returns reliable aggregate page counts. For…

Information Retrieval · Computer Science 2020-07-24 Andrew R. Cohen , Paul M. B. Vitanyi

Normalized Google Distance of Multisets with Applications

Normalized Google distance (NGD) is a relative semantic distance based on the World Wide Web (or any other large electronic database, for instance Wikipedia) and a search engine that returns aggregate page counts. The earlier NGD between…

Information Retrieval · Computer Science 2013-08-15 Andrew R. Cohen , P. M. B. Vitanyi

The Google Similarity Distance

Words and phrases acquire meaning from the way they are used in society, from their relative semantics to other words and phrases. For computers the equivalent of `society' is `database,' and the equivalent of `use' is `way to search the…

Computation and Language · Computer Science 2007-06-13 Rudi Cilibrasi , Paul M. B. Vitanyi

Just an Update on PMING Distance for Web-based Semantic Similarity in Artificial Intelligence and Data Mining

One of the main problems that emerges in the classic approach to semantics is the difficulty in acquisition and maintenance of ontologies and semantic annotations. On the other hand, the Internet explosion and the massive diffusion of…

Artificial Intelligence · Computer Science 2017-01-12 Valentina Franzoni

Word Embedding based Edit Distance

Text similarity calculation is a fundamental problem in natural language processing and related fields. In recent years, deep neural networks have been developed to perform the task and high performances have been achieved. The neural…

Computation and Language · Computer Science 2018-10-26 Yilin Niu , Chao Qiao , Hang Li , Minlie Huang

Normalized Information Distance

The normalized information distance is a universal distance measure for objects of all kinds. It is based on Kolmogorov complexity and thus uncomputable, but there are ways to utilize it. First, compression algorithms can be used to…

Information Retrieval · Computer Science 2008-09-16 Paul M. B. Vitanyi , Frank J. Balbach , Rudi L. Cilibrasi , Ming Li

Contextualized Semantic Distance between Highly Overlapped Texts

Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation. Better evaluation of the semantic distance between the overlapped sentences benefits the language…

Computation and Language · Computer Science 2023-06-14 Letian Peng , Zuchao Li , Hai Zhao

We survey the emerging area of compression-based, parameter-free, similarity distance measures useful in data-mining, pattern recognition, learning and automatic semantics extraction. Given a family of distances on a set of objects, a…

Computer Vision and Pattern Recognition · Computer Science 2007-05-23 Rudi Cilibrasi , Paul Vitanyi

This paper presents a new approach for measuring semantic similarity/distance between words and concepts. It combines a lexical taxonomy structure with corpus statistical information so that the semantic distance between nodes in the…

cmp-lg · Computer Science 2008-02-03 Jay J. Jiang , David W. Conrath

Universal Similarity

We survey a new area of parameter-free similarity distance measures useful in data-mining, pattern recognition, learning and automatic semantics extraction. Given a family of distances on a set of objects, a distance is universal up to a…

Information Retrieval · Computer Science 2007-05-23 Paul Vitanyi

Evaluating the impact of word embeddings on similarity scoring in practical information retrieval

Search behaviour is characterised using synonymy and polysemy as users often want to search information based on meaning. Semantic representation strategies represent a move towards richer associative connections that can adequately capture…

Information Retrieval · Computer Science 2026-02-06 Niall McCarroll , Kevin Curran , Eugene McNamee , Angela Clist , Andrew Brammer

Google distance between words

Cilibrasi and Vitanyi have demonstrated that it is possible to extract the meaning of words from the world-wide web. To achieve this, they rely on the number of webpages that are found through a Google search containing a given word and…

Computation and Language · Computer Science 2015-01-29 Bjørn Kjos-Hanssen , Alberto J. Evangelista

Re-evaluating Word Mover's Distance

The word mover's distance (WMD) is a fundamental technique for measuring the similarity of two documents. As the crux of WMD, it can take advantage of the underlying geometry of the word space by employing an optimal transport formulation.…

Machine Learning · Computer Science 2022-06-16 Ryoma Sato , Makoto Yamada , Hisashi Kashima

Towards Normalizing the Edit Distance Using a Genetic Algorithms Based Scheme

The normalized edit distance is one of the distances derived from the edit distance. It is useful in some applications because it takes into account the lengths of the two strings compared. The normalized edit distance is not defined in…

Neural and Evolutionary Computing · Computer Science 2013-12-09 Muhammad Marwan Muhammad Fuad

Ontological differentiation as a measure of semantic accuracy

Understanding semantic relationships within complex networks derived from lexical resources is fundamental for network science and language modeling. While network embedding methods capture contextual similarity, quantifying semantic…

Disordered Systems and Neural Networks · Physics 2026-01-09 Pablo Garcia-Cuadrillero , Fabio Revuelta , Jose Angel Capitan

Speeding up Word Mover's Distance and its variants via properties of distances between embeddings

The Word Mover's Distance (WMD) proposed by Kusner et al. is a distance between documents that takes advantage of semantic relations among words that are captured by their embeddings. This distance proved to be quite effective, obtaining…

Computation and Language · Computer Science 2020-05-12 Matheus Werner , Eduardo Laber

Unveiling the relationship between complex networks metrics and word senses

The automatic disambiguation of word senses (i.e., the identification of which of the meanings is used in a given context for a word that has multiple meanings) is essential for such applications as machine translation and information…

Physics and Society · Physics 2013-02-20 Diego R. Amancio , Osvaldo N. Oliveira , Luciano da F. Costa

Web Pages Clustering: A New Approach

The rapid growth of web has resulted in vast volume of information. Information availability at a rapid speed to the user is vital. English language (or any for that matter) has lot of ambiguity in the usage of words. So there is no…

Information Retrieval · Computer Science 2011-08-30 Jeevan H E , Prashanth P P , Punith Kumar S N , Vinay Hegde

Nonapproximablity of the Normalized Information Distance

Normalized information distance (NID) uses the theoretical notion of Kolmogorov complexity, which for practical purposes is approximated by the length of the compressed version of the file involved, using a real-world compression program.…

Computational Complexity · Computer Science 2009-10-23 Sebastiaan A. Terwijn , Leen Torenvliet , Paul M. B. Vitanyi

Word Rotator's Distance

A key principle in assessing textual similarity is measuring the degree of semantic overlap between two texts by considering the word alignment. Such alignment-based approaches are intuitive and interpretable; however, they are empirically…

Computation and Language · Computer Science 2020-11-17 Sho Yokoi , Ryo Takahashi , Reina Akama , Jun Suzuki , Kentaro Inui