A Topological Method for Comparing Document Semantics
Abstract
Comparing document semantics is one of the toughest tasks in both Natural Language Processing and Information Retrieval. To date, on one hand, the tools for this task are still rare. On the other hand, most relevant methods are devised from the statistic or the vector space model perspectives but nearly none from a topological perspective. In this paper, we hope to make a different sound. A novel algorithm based on topological persistence for comparing semantics similarity between two documents is proposed. Our experiments are conducted on a document dataset with human judges' results. A collection of state-of-the-art methods are selected for comparison. The experimental results show that our algorithm can produce highly human-consistent results, and also beats most state-of-the-art methods though ties with NLTK.
Cite
@article{arxiv.2012.04203,
title = {A Topological Method for Comparing Document Semantics},
author = {Yuqi Kong and Fanchao Meng and Benjamin Carterette},
journal= {arXiv preprint arXiv:2012.04203},
year = {2020}
}
Comments
9 pages, 3 tables, 9th International Conference on Natural Language Processing (NLP 2020)