Related papers: Normalized Web Distance and Word Similarity
Normalized web distance (NWD) is a similarity or normalized semantic distance based on the World Wide Web or another large electronic database, for instance Wikipedia, and a search engine that returns reliable aggregate page counts. For…
Normalized Google distance (NGD) is a relative semantic distance based on the World Wide Web (or any other large electronic database, for instance Wikipedia) and a search engine that returns aggregate page counts. The earlier NGD between…
Words and phrases acquire meaning from the way they are used in society, from their relative semantics to other words and phrases. For computers the equivalent of `society' is `database,' and the equivalent of `use' is `way to search the…
One of the main problems that emerges in the classic approach to semantics is the difficulty in acquisition and maintenance of ontologies and semantic annotations. On the other hand, the Internet explosion and the massive diffusion of…
Text similarity calculation is a fundamental problem in natural language processing and related fields. In recent years, deep neural networks have been developed to perform the task and high performances have been achieved. The neural…
The normalized information distance is a universal distance measure for objects of all kinds. It is based on Kolmogorov complexity and thus uncomputable, but there are ways to utilize it. First, compression algorithms can be used to…
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation. Better evaluation of the semantic distance between the overlapped sentences benefits the language…
We survey the emerging area of compression-based, parameter-free, similarity distance measures useful in data-mining, pattern recognition, learning and automatic semantics extraction. Given a family of distances on a set of objects, a…
This paper presents a new approach for measuring semantic similarity/distance between words and concepts. It combines a lexical taxonomy structure with corpus statistical information so that the semantic distance between nodes in the…
We survey a new area of parameter-free similarity distance measures useful in data-mining, pattern recognition, learning and automatic semantics extraction. Given a family of distances on a set of objects, a distance is universal up to a…
Search behaviour is characterised using synonymy and polysemy as users often want to search information based on meaning. Semantic representation strategies represent a move towards richer associative connections that can adequately capture…
Cilibrasi and Vitanyi have demonstrated that it is possible to extract the meaning of words from the world-wide web. To achieve this, they rely on the number of webpages that are found through a Google search containing a given word and…
The word mover's distance (WMD) is a fundamental technique for measuring the similarity of two documents. As the crux of WMD, it can take advantage of the underlying geometry of the word space by employing an optimal transport formulation.…
The normalized edit distance is one of the distances derived from the edit distance. It is useful in some applications because it takes into account the lengths of the two strings compared. The normalized edit distance is not defined in…
Understanding semantic relationships within complex networks derived from lexical resources is fundamental for network science and language modeling. While network embedding methods capture contextual similarity, quantifying semantic…
The Word Mover's Distance (WMD) proposed by Kusner et al. is a distance between documents that takes advantage of semantic relations among words that are captured by their embeddings. This distance proved to be quite effective, obtaining…
The automatic disambiguation of word senses (i.e., the identification of which of the meanings is used in a given context for a word that has multiple meanings) is essential for such applications as machine translation and information…
The rapid growth of web has resulted in vast volume of information. Information availability at a rapid speed to the user is vital. English language (or any for that matter) has lot of ambiguity in the usage of words. So there is no…
Normalized information distance (NID) uses the theoretical notion of Kolmogorov complexity, which for practical purposes is approximated by the length of the compressed version of the file involved, using a real-world compression program.…
A key principle in assessing textual similarity is measuring the degree of semantic overlap between two texts by considering the word alignment. Such alignment-based approaches are intuitive and interpretable; however, they are empirically…