English

K-tree: Large Scale Document Clustering

Information Retrieval 2010-01-07 v1 Artificial Intelligence Data Structures and Algorithms

Abstract

We introduce K-tree in an information retrieval context. It is an efficient approximation of the k-means clustering algorithm. Unlike k-means it forms a hierarchy of clusters. It has been extended to address issues with sparse representations. We compare performance and quality to CLUTO using document collections. The K-tree has a low time complexity that is suitable for large document collections. This tree structure allows for efficient disk based implementations where space requirements exceed that of main memory.

Keywords

Cite

@article{arxiv.1001.0830,
  title  = {K-tree: Large Scale Document Clustering},
  author = {Christopher M. De Vries and Shlomo Geva},
  journal= {arXiv preprint arXiv:1001.0830},
  year   = {2010}
}

Comments

2 pages, SIGIR 2009

R2 v1 2026-06-21T14:31:25.763Z