Vec2GC -- A Graph Based Clustering Method for Text Representations
Information Retrieval
2023-04-13 v2 Machine Learning
Abstract
NLP pipelines with limited or no labeled data, rely on unsupervised methods for document processing. Unsupervised approaches typically depend on clustering of terms or documents. In this paper, we introduce a novel clustering algorithm, Vec2GC (Vector to Graph Communities), an end-to-end pipeline to cluster terms or documents for any given text corpus. Our method uses community detection on a weighted graph of the terms or documents, created using text representation learning. Vec2GC clustering algorithm is a density based approach, that supports hierarchical clustering as well.
Cite
@article{arxiv.2104.09439,
title = {Vec2GC -- A Graph Based Clustering Method for Text Representations},
author = {Rajesh N Rao and Manojit Chakraborty},
journal= {arXiv preprint arXiv:2104.09439},
year = {2023}
}
Comments
5 pages, 1 figure