Related papers: A Topological Method for Comparing Document Semant…

Evolution of Semantic Similarity -- A Survey

Estimating the semantic similarity between text data is one of the challenging and open research problems in the field of Natural Language Processing (NLP). The versatility of natural language makes it difficult to define rule-based methods…

Computation and Language · Computer Science 2021-02-24 Dhivya Chandrasekaran , Vijay Mago

Terminology-based Text Embedding for Computing Document Similarities on Technical Content

We propose in this paper a new, hybrid document embedding approach in order to address the problem of document similarities with respect to the technical content. To do so, we employ a state-of-the-art graph techniques to first extract the…

Computation and Language · Computer Science 2019-07-02 Hamid Mirisaee , Eric Gaussier , Cedric Lagnier , Agnes Guerraz

Calculating Semantic Similarity between Academic Articles using Topic Event and Ontology

Determining semantic similarity between academic documents is crucial to many tasks such as plagiarism detection, automatic technical survey and semantic search. Current studies mostly focus on semantic similarity between concepts,…

Computation and Language · Computer Science 2017-12-01 Ming Liu , Bo Lang , Zepeng Gu

A Comparison of Document Similarity Algorithms

Document similarity is an important part of Natural Language Processing and is most commonly used for plagiarism-detection and text summarization. Thus, finding the overall most effective document similarity algorithm could have a major…

Computation and Language · Computer Science 2023-04-05 Nicholas Gahman , Vinayak Elangovan

Matching Handwritten Document Images

We address the problem of predicting similarity between a pair of handwritten document images written by different individuals. This has applications related to matching and mining in image collections containing handwritten content. A…

Computer Vision and Pattern Recognition · Computer Science 2016-05-20 Praveen Krishnan , C. V. Jawahar

Methods for Computing Legal Document Similarity: A Comparative Study

Computing similarity between two legal documents is an important and challenging task in the domain of Legal Information Retrieval. Finding similar legal documents has many applications in downstream tasks, including prior-case retrieval,…

Social and Information Networks · Computer Science 2020-04-28 Paheli Bhattacharya , Kripabandhu Ghosh , Arindam Pal , Saptarshi Ghosh

SimDoc: Topic Sequence Alignment based Document Similarity Framework

Document similarity is the problem of estimating the degree to which a given pair of documents has similar semantic content. An accurate document similarity measure can improve several enterprise relevant tasks such as document clustering,…

Computation and Language · Computer Science 2017-11-15 Gaurav Maheshwari , Priyansh Trivedi , Harshita Sahijwani , Kunal Jha , Sourish Dasgupta , Jens Lehmann

Textual Spatial Cosine Similarity

When dealing with document similarity many methods exist today, like cosine similarity. More complex methods are also available based on the semantic analysis of textual information, which are computationally expensive and rarely used in…

Information Retrieval · Computer Science 2015-05-18 Giancarlo Crocetti

Document Retrieval using Predication Similarity

Document retrieval has been an important research problem over many years in the information retrieval community. State-of-the-art techniques utilize various methods in matching documents to a given document including keywords, phrases, and…

Information Retrieval · Computer Science 2016-04-21 Kalpa Gunaratna

A Topological Approach to Compare Document Semantics Based on a New Variant of Syntactic N-grams

This paper delivers a new perspective of thinking and utilizing syntactic n-grams (sn-grams). Sn-grams are a type of non-linear n-grams which have been playing a critical role in many NLP tasks. Introducing sn-grams to comparing document…

Computation and Language · Computer Science 2021-03-10 Fanchao Meng

A Novel Method of Extracting Topological Features from Word Embeddings

In recent years, topological data analysis has been utilized for a wide range of problems to deal with high dimensional noisy data. While text representations are often high dimensional and noisy, there are only a few work on the…

Machine Learning · Computer Science 2020-04-21 Shafie Gholizadeh , Armin Seyeditabari , Wlodek Zadrozny

Measuring similarity between texts is an important task for several applications. Available approaches to measure document similarity are inadequate for document pairs that have non-comparable lengths, such as a long document and its…

Computation and Language · Computer Science 2019-03-27 Hongyu Gong , Tarek Sakakini , Suma Bhat , Jinjun Xiong

Multi-document Summarization by Graph Search and Matching

We describe a new method for summarizing similarities and differences in a pair of related documents using a graph representation for text. Concepts denoted by words, phrases, and proper names in the document are represented positionally as…

cmp-lg · Computer Science 2007-05-23 Inderjeet Mani , Eric Bloedorn

Topological Sort for Sentence Ordering

Sentence ordering is the task of arranging the sentences of a given text in the correct order. Recent work using deep neural networks for this task has framed it as a sequence prediction problem. In this paper, we propose a new framing of…

Computation and Language · Computer Science 2020-05-04 Shrimai Prabhumoye , Ruslan Salakhutdinov , Alan W Black

Artificial Intelligence federates numerous scientific fields in the aim of developing machines able to assist human operators performing complex treatments -- most of which demand high cognitive skills (e.g. learning or decision processes).…

Artificial Intelligence · Computer Science 2017-04-19 Sébastien Harispe , Sylvie Ranwez , Stefan Janaqi , Jacky Montmain

Massively Multilingual Document Alignment with Cross-lingual Sentence-Mover's Distance

Document alignment aims to identify pairs of documents in two distinct languages that are of comparable content or translations of each other. Such aligned data can be used for a variety of NLP tasks from training cross-lingual…

Computation and Language · Computer Science 2020-10-13 Ahmed El-Kishky , Francisco Guzmán

Content-based Text Categorization using Wikitology

A major computational burden, while performing document clustering, is the calculation of similarity measure between a pair of documents. Similarity measure is a function that assign a real number between 0 and 1 to a pair of documents,…

Information Retrieval · Computer Science 2012-08-20 Muhammad Rafi , Sundus Hassan , Mohammad Shahid Shaikh

An Efficient Technique for Similarity Identification between Ontologies

Ontologies usually suffer from the semantic heterogeneity when simultaneously used in information sharing, merging, integrating and querying processes. Therefore, the similarity identification between ontologies being used becomes a…

Artificial Intelligence · Computer Science 2010-06-24 Amjad Farooq , Syed Ahsan , Abad Shah

Semantic classifier approach to document classification

In this paper we propose a new document classification method, bridging discrepancies (so-called semantic gap) between the training set and the application sets of textual data. We demonstrate its superiority over classical text…

Information Retrieval · Computer Science 2017-01-17 Piotr Borkowski , Krzysztof Ciesielski , Mieczysław A. Kłopotek

Multi-Image Semantic Matching by Mining Consistent Features

This work proposes a multi-image matching method to estimate semantic correspondences across multiple images. In contrast to the previous methods that optimize all pairwise correspondences, the proposed method identifies and matches only a…

Computer Vision and Pattern Recognition · Computer Science 2018-05-02 Qianqian Wang , Xiaowei Zhou , Kostas Daniilidis