English

Vec2GC -- A Graph Based Clustering Method for Text Representations

Information Retrieval 2023-04-13 v2 Machine Learning

Abstract

NLP pipelines with limited or no labeled data, rely on unsupervised methods for document processing. Unsupervised approaches typically depend on clustering of terms or documents. In this paper, we introduce a novel clustering algorithm, Vec2GC (Vector to Graph Communities), an end-to-end pipeline to cluster terms or documents for any given text corpus. Our method uses community detection on a weighted graph of the terms or documents, created using text representation learning. Vec2GC clustering algorithm is a density based approach, that supports hierarchical clustering as well.

Keywords

Cite

@article{arxiv.2104.09439,
  title  = {Vec2GC -- A Graph Based Clustering Method for Text Representations},
  author = {Rajesh N Rao and Manojit Chakraborty},
  journal= {arXiv preprint arXiv:2104.09439},
  year   = {2023}
}

Comments

5 pages, 1 figure

R2 v1 2026-06-24T01:20:14.806Z