Related papers: Content-based Text Categorization using Wikitology

An improved semantic similarity measure for document clustering based on topic maps

A major computational burden, while performing document clustering, is the calculation of similarity measure between a pair of documents. Similarity measure is a function that assigns a real number between 0 and 1 to a pair of documents,…

Information Retrieval · Computer Science 2013-03-19 Muhammad Rafi , Mohammad Shahid Shaikh

Document Clustering based on Topic Maps

Importance of document clustering is now widely acknowledged by researchers for better management, smart navigation, efficient filtering, and concise summarization of large collection of documents like World Wide Web (WWW). The next…

Information Retrieval · Computer Science 2011-12-30 Muhammad Rafi , M. Shahid Shaikh , Amir Farooq

SimDoc: Topic Sequence Alignment based Document Similarity Framework

Document similarity is the problem of estimating the degree to which a given pair of documents has similar semantic content. An accurate document similarity measure can improve several enterprise relevant tasks such as document clustering,…

Computation and Language · Computer Science 2017-11-15 Gaurav Maheshwari , Priyansh Trivedi , Harshita Sahijwani , Kunal Jha , Sourish Dasgupta , Jens Lehmann

Semantic Document Clustering on Named Entity Features

Keyword-based information processing has limitations due to simple treatment of words. In this paper, we introduce named entities as objectives into document clustering, which are the key elements defining document semantics and in many…

Information Retrieval · Computer Science 2018-07-23 Tru H. Cao , Vuong M. Ngo , Dung T. Hong , Tho T. Quan

Biomedical Document Clustering and Visualization based on the Concepts of Diseases

Document clustering is a text mining technique used to provide better document search and browsing in digital libraries or online corpora. A lot of research has been done on biomedical document clustering that is based on using existing…

Computation and Language · Computer Science 2018-10-24 Setu Shah , Xiao Luo

Calculating Semantic Similarity between Academic Articles using Topic Event and Ontology

Determining semantic similarity between academic documents is crucial to many tasks such as plagiarism detection, automatic technical survey and semantic search. Current studies mostly focus on semantic similarity between concepts,…

Computation and Language · Computer Science 2017-12-01 Ming Liu , Bo Lang , Zepeng Gu

Measuring similarity between texts is an important task for several applications. Available approaches to measure document similarity are inadequate for document pairs that have non-comparable lengths, such as a long document and its…

Computation and Language · Computer Science 2019-03-27 Hongyu Gong , Tarek Sakakini , Suma Bhat , Jinjun Xiong

Contextual Document Similarity for Content-based Literature Recommender Systems

To cope with the ever-growing information overload, an increasing number of digital libraries employ content-based recommender systems. These systems traditionally recommend related documents with the help of similarity measures. However,…

Information Retrieval · Computer Science 2020-08-04 Malte Ostendorff

A comparison of two suffix tree-based document clustering algorithms

Document clustering as an unsupervised approach extensively used to navigate, filter, summarize and manage large collection of document repositories like the World Wide Web (WWW). Recently, focuses in this domain shifted from traditional…

Information Retrieval · Computer Science 2012-01-11 Muhammad Rafi , M. Maujood , M. M. Fazal , S. M. Ali

Representing Mixtures of Word Embeddings with Mixtures of Topic Embeddings

A topic model is often formulated as a generative model that explains how each word of a document is generated given a set of topics and document-specific topic proportions. It is focused on capturing the word co-occurrences in a document…

Machine Learning · Computer Science 2022-03-16 Dongsheng Wang , Dandan Guo , He Zhao , Huangjie Zheng , Korawat Tanwisuth , Bo Chen , Mingyuan Zhou

We propose a computationally light method for estimating similarities between text documents, which we call the density similarity (DS) method. The method is based on a word embedding in a high-dimensional Euclidean space and on kernel…

Computation and Language · Computer Science 2020-09-03 Ilia Rushkin

A Comprehensive Comparative Study of Word and Sentence Similarity Measures

Sentence similarity is considered the basis of many natural language tasks such as information retrieval, question answering and text summarization. The semantic meaning between compared text fragments is based on the words semantic…

Information Retrieval · Computer Science 2016-10-17 Issa Atoum , Ahmed Otoom , Narayanan Kulathuramaiyer

Description and Evaluation of Semantic Similarity Measures Approaches

In recent years, semantic similarity measure has a great interest in Semantic Web and Natural Language Processing (NLP). Several similarity measures have been developed, being given the existence of a structured knowledge representation…

Computation and Language · Computer Science 2013-10-31 Thabet Slimani

Efficient Clustering from Distributions over Topics

There are many scenarios where we may want to find pairs of textually similar documents in a large corpus (e.g. a researcher doing literature review, or an R&D project manager analyzing project proposals). To programmatically discover those…

Computation and Language · Computer Science 2020-12-16 Carlos Badenes-Olmedo , Jose-Luis Redondo García , Oscar Corcho

A new simple and effective measure for bag-of-word inter-document similarity measurement

To measure the similarity of two documents in the bag-of-words (BoW) vector representation, different term weighting schemes are used to improve the performance of cosine similarity---the most widely used inter-document similarity measure…

Information Retrieval · Computer Science 2019-02-12 Sunil Aryal , Kai Ming Ting , Takashi Washio , Gholamreza Haffari

A Topological Method for Comparing Document Semantics

Comparing document semantics is one of the toughest tasks in both Natural Language Processing and Information Retrieval. To date, on one hand, the tools for this task are still rare. On the other hand, most relevant methods are devised from…

Computation and Language · Computer Science 2020-12-09 Yuqi Kong , Fanchao Meng , Benjamin Carterette

Semantic Measures for the Comparison of Units of Language, Concepts or Instances from Text and Knowledge Base Analysis

Semantic measures are widely used today to estimate the strength of the semantic relationship between elements of various types: units of language (e.g., words, sentences, documents), concepts or even instances semantically characterized…

Computation and Language · Computer Science 2016-10-25 Sébastien Harispe , Sylvie Ranwez , Stefan Janaqi , Jacky Montmain

Graph-Community Detection for Cross-Document Topic Segment Relationship Identification

In this paper we propose a graph-community detection approach to identify cross-document relationships at the topic segment level. Given a set of related documents, we automatically find these relationships by clustering segments with…

Computation and Language · Computer Science 2016-06-14 Pedro Mota , Maxine Eskenazi , Luisa Coheur

Artificial Intelligence federates numerous scientific fields in the aim of developing machines able to assist human operators performing complex treatments -- most of which demand high cognitive skills (e.g. learning or decision processes).…

Artificial Intelligence · Computer Science 2017-04-19 Sébastien Harispe , Sylvie Ranwez , Stefan Janaqi , Jacky Montmain

Textual Spatial Cosine Similarity

When dealing with document similarity many methods exist today, like cosine similarity. More complex methods are also available based on the semantic analysis of textual information, which are computationally expensive and rarely used in…

Information Retrieval · Computer Science 2015-05-18 Giancarlo Crocetti