Related papers: Google distance between words

The Google Similarity Distance

Words and phrases acquire meaning from the way they are used in society, from their relative semantics to other words and phrases. For computers the equivalent of `society' is `database,' and the equivalent of `use' is `way to search the…

Computation and Language · Computer Science 2007-06-13 Rudi Cilibrasi , Paul M. B. Vitanyi

Compression-based Similarity

First we consider pair-wise distances for literal objects consisting of finite binary files. These files are taken to contain all of their meaning, like genomes or books. The distances are based on compression of the objects concerned,…

Information Theory · Computer Science 2011-10-21 Paul M. B. Vitanyi

A set of ontology matching algorithms (for finding correspondences between concepts) is based on a thesaurus that provides the source data for the semantic distance calculations. In this wiki era, new resources may spring up and improve…

Information Retrieval · Computer Science 2009-10-12 A. A. Krizhanovsky , Feiyu Lin

Measuring Meaning on the World-Wide Web

We introduce the notion of the 'meaning bound' of a word with respect to another word by making use of the World-Wide Web as a conceptual environment for meaning. The meaning of a word with respect to another word is established by…

Artificial Intelligence · Computer Science 2012-03-28 Diederik Aerts

Normalized Web Distance and Word Similarity

There is a great deal of work in cognitive psychology, linguistics, and computer science, about using word (or phrase) frequencies in context in text corpora to develop measures for word similarity or word association, going back to at…

Computation and Language · Computer Science 2009-05-26 Rudi L. Cilibrasi , Paul M. B. Vitanyi

A semantic association page rank algorithm for web search engines

The majority of Semantic Web search engines retrieve information by focusing on the use of concepts and relations restricted to the query provided by the user. By trying to guess the implicit meaning between these concepts and relations,…

Information Retrieval · Computer Science 2012-11-28 Manuel Rojas

We survey the emerging area of compression-based, parameter-free, similarity distance measures useful in data-mining, pattern recognition, learning and automatic semantics extraction. Given a family of distances on a set of objects, a…

Computer Vision and Pattern Recognition · Computer Science 2007-05-23 Rudi Cilibrasi , Paul Vitanyi

Determining the Unithood of Word Sequences using a Probabilistic Approach

Most research related to unithood were conducted as part of a larger effort for the determination of termhood. Consequently, novelties are rare in this small sub-field of term extraction. In addition, existing work were mostly empirically…

Artificial Intelligence · Computer Science 2008-10-02 Wilson Wong , Wei Liu , Mohammed Bennamoun

This paper proposes a method for measuring semantic similarity between words as a new tool for text analysis. The similarity is measured on a semantic network constructed systematically from a subset of the English dictionary, LDOCE…

cmp-lg · Computer Science 2008-02-03 Hideki Kozima , Teiji Furugori

Universal Similarity

We survey a new area of parameter-free similarity distance measures useful in data-mining, pattern recognition, learning and automatic semantics extraction. Given a family of distances on a set of objects, a distance is universal up to a…

Information Retrieval · Computer Science 2007-05-23 Paul Vitanyi

Normalized Google Distance of Multisets with Applications

Normalized Google distance (NGD) is a relative semantic distance based on the World Wide Web (or any other large electronic database, for instance Wikipedia) and a search engine that returns aggregate page counts. The earlier NGD between…

Information Retrieval · Computer Science 2013-08-15 Andrew R. Cohen , P. M. B. Vitanyi

Measuring Global Similarity between Texts

We propose a new similarity measure between texts which, contrary to the current state-of-the-art approaches, takes a global view of the texts to be compared. We have implemented a tool to compute our textual distance and conducted…

Computation and Language · Computer Science 2014-05-15 Uli Fahrenberg , Fabrizio Biondi , Kevin Corre , Cyrille Jegourel , Simon Kongshøj , Axel Legay

Distributional semantics beyond words: Supervised learning of analogy and paraphrase

There have been several efforts to extend distributional semantics beyond individual words, to measure the similarity of word pairs, phrases, and sentences (briefly, tuples; ordered sets of words, contiguous or noncontiguous). One way to…

Machine Learning · Computer Science 2013-10-21 Peter D. Turney

Normalized web distance (NWD) is a similarity or normalized semantic distance based on the World Wide Web or another large electronic database, for instance Wikipedia, and a search engine that returns reliable aggregate page counts. For…

Information Retrieval · Computer Science 2020-07-24 Andrew R. Cohen , Paul M. B. Vitanyi

A Visual Distance for WordNet

Measuring the distance between concepts is an important field of study of Natural Language Processing, as it can be used to improve tasks related to the interpretation of those same concepts. WordNet, which includes a wide variety of…

Computation and Language · Computer Science 2018-04-30 Raquel Pérez-Arnal , Armand Vilalta , Dario Garcia-Gasulla , Ulises Cortés , Eduard Ayguadé , Jesus Labarta

A Study of Metrics of Distance and Correlation Between Ranked Lists for Compositionality Detection

Compositionality in language refers to how much the meaning of some phrase can be decomposed into the meaning of its constituents and the way these constituents are combined. Based on the premise that substitution by synonyms is…

Computation and Language · Computer Science 2017-03-13 Christina Lioma , Niels Dalum Hansen

Towards Quantifying the Distance between Opinions

Increasingly, critical decisions in public policy, governance, and business strategy rely on a deeper understanding of the needs and opinions of constituent members (e.g. citizens, shareholders). While it has become easier to collect a…

Computation and Language · Computer Science 2020-01-28 Saket Gurukar , Deepak Ajwani , Sourav Dutta , Juho Lauri , Srinivasan Parthasarathy , Alessandra Sala

Determining the Unithood of Word Sequences using Mutual Information and Independence Measure

Most works related to unithood were conducted as part of a larger effort for the determination of termhood. Consequently, the number of independent research that study the notion of unithood and produce dedicated techniques for measuring…

Artificial Intelligence · Computer Science 2008-10-02 Wilson Wong , Wei Liu , Mohammed Bennamoun

In many applications of natural language processing (NLP) it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations ``eat a peach'' and…

Computation and Language · Computer Science 2007-05-23 Ido Dagan , Lillian Lee , Fernando C. N. Pereira

Web Pages Clustering: A New Approach

The rapid growth of web has resulted in vast volume of information. Information availability at a rapid speed to the user is vital. English language (or any for that matter) has lot of ambiguity in the usage of words. So there is no…

Information Retrieval · Computer Science 2011-08-30 Jeevan H E , Prashanth P P , Punith Kumar S N , Vinay Hegde