Related papers: K-tree: Large Scale Document Clustering

Document Clustering with K-tree

This paper describes the approach taken to the XML Mining track at INEX 2008 by a group at the Queensland University of Technology. We introduce the K-tree clustering algorithm in an Information Retrieval context by adapting it for document…

Information Retrieval · Computer Science 2010-01-07 Christopher M. De Vries , Shlomo Geva

Random Indexing K-tree

Random Indexing (RI) K-tree is the combination of two algorithms for clustering. Many large scale problems exist in document clustering. RI K-tree scales well with large inputs due to its low complexity. It also exhibits features that are…

Information Retrieval · Computer Science 2010-02-02 Christopher M. De Vries , Lance De Vine , Shlomo Geva

Accelerating k-Means Clustering with Cover Trees

The k-means clustering algorithm is a popular algorithm that partitions data into k clusters. There are many improvements to accelerate the standard algorithm. Most current research employs upper and lower bounds on point-to-cluster…

Machine Learning · Computer Science 2024-10-22 Andreas Lang , Erich Schubert

Clustering Unstructured Data (Flat Files) - An Implementation in Text Mining Tool

With the advancement of technology and reduced storage costs, individuals and organizations are tending towards the usage of electronic media for storing textual information and documents. It is time consuming for readers to retrieve…

Information Retrieval · Computer Science 2010-07-27 Yasir Safeer , Atika Mustafa , Anis Noor Ali

Document Clustering using K-Means and K-Medoids

With the huge upsurge of information in day-to-days life, it has become difficult to assemble relevant information in nick of time. But people, always are in dearth of time, they need everything quick. Hence clustering was introduced to…

Information Retrieval · Computer Science 2015-03-02 Rakesh Chandra Balabantaray , Chandrali Sarma , Monica Jha

Approximate k-nearest neighbour based spatial clustering using k-d tree

Different spatial objects that vary in their characteristics, such as molecular biology and geography, are presented in spatial areas. Methods to organize, manage, and maintain those objects in a structured manner are required. Data mining…

Databases · Computer Science 2013-03-11 Dr. Mohammed Otair

Explainable $k$-Means and $k$-Medians Clustering

Clustering is a popular form of unsupervised learning for geometric data. Unfortunately, many clustering algorithms lead to cluster assignments that are hard to explain, partially because they depend on all the features of the data in a…

Machine Learning · Computer Science 2020-09-23 Sanjoy Dasgupta , Nave Frost , Michal Moshkovitz , Cyrus Rashtchian

A sampling-based approach for efficient clustering in large datasets

We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our…

Machine Learning · Computer Science 2022-03-30 Georgios Exarchakis , Omar Oubari , Gregor Lenz

Efficient Sparse Spherical k-Means for Document Clustering

Spherical k-Means is frequently used to cluster document collections because it performs reasonably well in many settings and is computationally efficient. However, the time complexity increases linearly with the number of clusters k, which…

Machine Learning · Computer Science 2021-08-03 Johannes Knittel , Steffen Koch , Thomas Ertl

An Analytical Approach to Document Clustering Based on Internal Criterion Function

Fast and high quality document clustering is an important task in organizing information, search engine results obtaining from user query, enhancing web crawling and information retrieval. With the large amount of data available and with a…

Information Retrieval · Computer Science 2010-03-11 Alok Ranjan , Harish Verma , Eatesh Kandpal , Joydip Dhar

Explaining Kernel Clustering via Decision Trees

Despite the growing popularity of explainable and interpretable machine learning, there is still surprisingly limited work on inherently interpretable clustering methods. Recently, there has been a surge of interest in explaining the…

Machine Learning · Computer Science 2024-11-26 Maximilian Fleissner , Leena Chennuru Vankadara , Debarghya Ghoshdastidar

A comparison of two suffix tree-based document clustering algorithms

Document clustering as an unsupervised approach extensively used to navigate, filter, summarize and manage large collection of document repositories like the World Wide Web (WWW). Recently, focuses in this domain shifted from traditional…

Information Retrieval · Computer Science 2012-01-11 Muhammad Rafi , M. Maujood , M. M. Fazal , S. M. Ali

Strong Consistency of Reduced K-means Clustering

Reduced k-means clustering is a method for clustering objects in a low-dimensional subspace. The advantage of this method is that both clustering of objects and low-dimensional subspace reflecting the cluster structure are simultaneously…

Statistics Theory · Mathematics 2014-02-14 Yoshikazu Terada

Using Multi-Core HW/SW Co-design Architecture for Accelerating K-means Clustering Algorithm

The capability of classifying and clustering a desired set of data is an essential part of building knowledge from data. However, as the size and dimensionality of input data increases, the run-time for such clustering algorithms is…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-25 Hadi Mardani Kamali

Dual-tree $k$-means with bounded iteration runtime

k-means is a widely used clustering algorithm, but for $k$ clusters and a dataset size of $N$, each iteration of Lloyd's algorithm costs $O(kN)$ time. Although there are existing techniques to accelerate single Lloyd iterations, none of…

Data Structures and Algorithms · Computer Science 2016-01-18 Ryan R. Curtin

Document Clustering using K-Medoids

People are always in search of matters for which they are prone to use internet, but again it has huge assemblage of data due to which it becomes difficult for the reader to get the most accurate data. To make it easier for people to gather…

Information Retrieval · Computer Science 2015-04-07 Monica Jha

Scalable Kernel Clustering: Approximate Kernel k-means

Kernel-based clustering algorithms have the ability to capture the non-linear structure in real world data. Among various kernel-based clustering algorithms, kernel k-means has gained popularity due to its simple iterative nature and ease…

Computer Vision and Pattern Recognition · Computer Science 2014-02-18 Radha Chitta , Rong Jin , Timothy C. Havens , Anil K. Jain

Comparative Analysis of Optimization Strategies for K-means Clustering in Big Data Contexts: A Review

This paper presents a comparative analysis of different optimization techniques for the K-means algorithm in the context of big data. K-means is a widely used clustering algorithm, but it can suffer from scalability issues when dealing with…

Machine Learning · Computer Science 2024-05-21 Ravil Mussabayev , Rustam Mussabayev

A density peaks clustering algorithm with sparse search and K-d tree

Density peaks clustering has become a nova of clustering algorithm because of its simplicity and practicality. However, there is one main drawback: it is time-consuming due to its high computational complexity. Herein, a density peaks…

Machine Learning · Statistics 2022-07-21 Yunxiao Shan , Shu Li , Fuxiang Li , Yuxin Cui , Shuai Li , Ming Zhou , Xiang Li

Automated Document Indexing via Intelligent Hierarchical Clustering: A Novel Approach

With the rising quantity of textual data available in electronic format, the need to organize it become a highly challenging task. In the present paper, we explore a document organization framework that exploits an intelligent hierarchical…

Information Retrieval · Computer Science 2015-04-02 Rajendra Kumar Roul , Shubham Rohan Asthana , Sanjay Kumar Sahay