K-tree: Large Scale Document Clustering
Information Retrieval
2010-01-07 v1 Artificial Intelligence
Data Structures and Algorithms
Abstract
We introduce K-tree in an information retrieval context. It is an efficient approximation of the k-means clustering algorithm. Unlike k-means it forms a hierarchy of clusters. It has been extended to address issues with sparse representations. We compare performance and quality to CLUTO using document collections. The K-tree has a low time complexity that is suitable for large document collections. This tree structure allows for efficient disk based implementations where space requirements exceed that of main memory.
Cite
@article{arxiv.1001.0830,
title = {K-tree: Large Scale Document Clustering},
author = {Christopher M. De Vries and Shlomo Geva},
journal= {arXiv preprint arXiv:1001.0830},
year = {2010}
}
Comments
2 pages, SIGIR 2009