Related papers: Random Indexing K-tree

K-tree: Large Scale Document Clustering

We introduce K-tree in an information retrieval context. It is an efficient approximation of the k-means clustering algorithm. Unlike k-means it forms a hierarchy of clusters. It has been extended to address issues with sparse…

Information Retrieval · Computer Science 2010-01-07 Christopher M. De Vries , Shlomo Geva

Document Clustering with K-tree

This paper describes the approach taken to the XML Mining track at INEX 2008 by a group at the Queensland University of Technology. We introduce the K-tree clustering algorithm in an Information Retrieval context by adapting it for document…

Information Retrieval · Computer Science 2010-01-07 Christopher M. De Vries , Shlomo Geva

Accelerating k-Means Clustering with Cover Trees

The k-means clustering algorithm is a popular algorithm that partitions data into k clusters. There are many improvements to accelerate the standard algorithm. Most current research employs upper and lower bounds on point-to-cluster…

Machine Learning · Computer Science 2024-10-22 Andreas Lang , Erich Schubert

A density peaks clustering algorithm with sparse search and K-d tree

Density peaks clustering has become a nova of clustering algorithm because of its simplicity and practicality. However, there is one main drawback: it is time-consuming due to its high computational complexity. Herein, a density peaks…

Machine Learning · Statistics 2022-07-21 Yunxiao Shan , Shu Li , Fuxiang Li , Yuxin Cui , Shuai Li , Ming Zhou , Xiang Li

F-tree: an algorithm for clustering transactional data using frequency tree

Clustering is an important data mining technique that groups similar data records, recently categorical transaction clustering is received more attention. In this research, we study the problem of categorical data clustering for…

Databases · Computer Science 2017-05-03 Mahmoud Mahdi , Samir Abdelrahman , Reem Bahgat , Ismail Ismail

Explainable $k$-Means and $k$-Medians Clustering

Clustering is a popular form of unsupervised learning for geometric data. Unfortunately, many clustering algorithms lead to cluster assignments that are hard to explain, partially because they depend on all the features of the data in a…

Machine Learning · Computer Science 2020-09-23 Sanjoy Dasgupta , Nave Frost , Michal Moshkovitz , Cyrus Rashtchian

Tree Index: A New Cluster Evaluation Technique

We introduce a cluster evaluation technique called Tree Index. Our Tree Index algorithm aims at describing the structural information of the clustering rather than the quantitative format of cluster-quality indexes (where the representation…

Machine Learning · Computer Science 2020-03-25 A. H. Beg , Md Zahidul Islam , Vladimir Estivill-Castro

A H-K Clustering Algorithm For High Dimensional Data Using Ensemble Learning

Advances made to the traditional clustering algorithms solves the various problems such as curse of dimensionality and sparsity of data for multiple attributes. The traditional H-K clustering algorithm can solve the randomness and apriority…

Databases · Computer Science 2015-01-13 Rashmi Paithankar , Bharat Tidke

A Novel Algorithm for Informative Meta Similarity Clusters Using Minimum Spanning Tree

The minimum spanning tree clustering algorithm is capable of detecting clusters with irregular boundaries. In this paper we propose two minimum spanning trees based clustering algorithm. The first algorithm produces k clusters with center…

Other Computer Science · Computer Science 2010-05-26 S. John Peter , S. P. Victor

Dual-tree $k$-means with bounded iteration runtime

k-means is a widely used clustering algorithm, but for $k$ clusters and a dataset size of $N$, each iteration of Lloyd's algorithm costs $O(kN)$ time. Although there are existing techniques to accelerate single Lloyd iterations, none of…

Data Structures and Algorithms · Computer Science 2016-01-18 Ryan R. Curtin

An Analytical Approach to Document Clustering Based on Internal Criterion Function

Fast and high quality document clustering is an important task in organizing information, search engine results obtaining from user query, enhancing web crawling and information retrieval. With the large amount of data available and with a…

Information Retrieval · Computer Science 2010-03-11 Alok Ranjan , Harish Verma , Eatesh Kandpal , Joydip Dhar

Document Retrieval on Repetitive String Collections

Most of the fastest-growing string collections today are repetitive, that is, most of the constituent documents are similar to many others. As these collections keep growing, a key approach to handling them is to exploit their…

Information Retrieval · Computer Science 2017-05-22 Travis Gagie , Aleksi Hartikainen , Kalle Karhu , Juha Kärkkäinen , Gonzalo Navarro , Simon J. Puglisi , Jouni Sirén

The Effect of Points Dispersion on the $k$-nn Search in Random Projection Forests

Partitioning trees are efficient data structures for $k$-nearest neighbor search. Machine learning libraries commonly use a special type of partitioning trees called $k$d-trees to perform $k$-nn search. Unfortunately, $k$d-trees can be…

Machine Learning · Computer Science 2023-02-28 Mashaan Alshammari , John Stavrakakis , Adel F. Ahmed , Masahiro Takatsuka

Automated Document Indexing via Intelligent Hierarchical Clustering: A Novel Approach

With the rising quantity of textual data available in electronic format, the need to organize it become a highly challenging task. In the present paper, we explore a document organization framework that exploits an intelligent hierarchical…

Information Retrieval · Computer Science 2015-04-02 Rajendra Kumar Roul , Shubham Rohan Asthana , Sanjay Kumar Sahay

The "AI+R"-tree: An Instance-optimized R-tree

The emerging class of instance-optimized systems has shown potential to achieve high performance by specializing to a specific data and query workloads. Particularly, Machine Learning (ML) techniques have been applied successfully to build…

Databases · Computer Science 2022-07-04 Abdullah-Al-Mamun , Ch. Md. Rakin Haider , Jianguo Wang , Walid G. Aref

Fast k-NN search

Efficient index structures for fast approximate nearest neighbor queries are required in many applications such as recommendation systems. In high-dimensional spaces, many conventional methods suffer from excessive usage of memory and slow…

Machine Learning · Statistics 2019-04-24 Ville Hyvönen , Teemu Pitkänen , Sotiris Tasoulis , Elias Jääsaari , Risto Tuomainen , Liang Wang , Jukka Corander , Teemu Roos

Cluster-Based Information Retrieval by using (K-means)- Hierarchical Parallel Genetic Algorithms Approach

Cluster-based information retrieval is one of the Information retrieval(IR) tools that organize, extract features and categorize the web documents according to their similarity. Unlike traditional approaches, cluster-based IR is fast in…

Artificial Intelligence · Computer Science 2020-08-04 Sarah Hussein Toman , Mohammed Hamzah Abed , Zinah Hussein Toman

In-memory Multidimensional Indexing Using the skd-tree

In this paper, we revisit the problem of indexing multi-dimensional data in memory for the efficient support of multi-dimensional range queries and nearest neighbor queries. This is a classic problem in main-memory databases, where there is…

Databases · Computer Science 2026-05-06 Achilleas Michalopoulos , Dimitrios Tsitsigkos , Nikos Mamoulis

Strong Consistency of Reduced K-means Clustering

Reduced k-means clustering is a method for clustering objects in a low-dimensional subspace. The advantage of this method is that both clustering of objects and low-dimensional subspace reflecting the cluster structure are simultaneously…

Statistics Theory · Mathematics 2014-02-14 Yoshikazu Terada

A Propound Method for the Improvement of Cluster Quality

In this paper Knockout Refinement Algorithm (KRA) is proposed to refine original clusters obtained by applying SOM and K-Means clustering algorithms. KRA Algorithm is based on Contingency Table concepts. Metrics are computed for the…

Machine Learning · Computer Science 2013-07-26 Shveta Kundra Bhatia , V. S. Dixit