Related papers: Clustering Unstructured Data (Flat Files) - An Imp…

Document Clustering using K-Means and K-Medoids

With the huge upsurge of information in day-to-days life, it has become difficult to assemble relevant information in nick of time. But people, always are in dearth of time, they need everything quick. Hence clustering was introduced to…

Information Retrieval · Computer Science 2015-03-02 Rakesh Chandra Balabantaray , Chandrali Sarma , Monica Jha

An Analytical Approach to Document Clustering Based on Internal Criterion Function

Fast and high quality document clustering is an important task in organizing information, search engine results obtaining from user query, enhancing web crawling and information retrieval. With the large amount of data available and with a…

Information Retrieval · Computer Science 2010-03-11 Alok Ranjan , Harish Verma , Eatesh Kandpal , Joydip Dhar

Accessing accurate documents by mining auxiliary document information

Earlier techniques of text mining included algorithms like k-means, Naive Bayes, SVM which classify and cluster the text document for mining relevant information about the documents. The need for improving the mining techniques has us…

Information Retrieval · Computer Science 2016-05-10 Jinju Joby , Jyothi Korra

Document Clustering using K-Medoids

People are always in search of matters for which they are prone to use internet, but again it has huge assemblage of data due to which it becomes difficult for the reader to get the most accurate data. To make it easier for people to gather…

Information Retrieval · Computer Science 2015-04-07 Monica Jha

Efficient Big Text Data Clustering Algorithms using Hadoop and Spark

Document clustering is a traditional, efficient and yet quite effective, text mining technique when we need to get a better insight of the documents of a collection that could be grouped together. The K-Means algorithm and the Hierarchical…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-12-02 Sergios Gerakidis , Sofia Megarchioti , Basilis Mamalis

A Survey on optimization approaches to text document clustering

Text Document Clustering is one of the fastest growing research areas because of availability of huge amount of information in an electronic form. There are several number of techniques launched for clustering documents in such a way that…

Information Retrieval · Computer Science 2014-01-13 R. Jensi , Dr. G. Wiselin Jiji

K-tree: Large Scale Document Clustering

We introduce K-tree in an information retrieval context. It is an efficient approximation of the k-means clustering algorithm. Unlike k-means it forms a hierarchy of clusters. It has been extended to address issues with sparse…

Information Retrieval · Computer Science 2010-01-07 Christopher M. De Vries , Shlomo Geva

Document Clustering with K-tree

This paper describes the approach taken to the XML Mining track at INEX 2008 by a group at the Queensland University of Technology. We introduce the K-tree clustering algorithm in an Information Retrieval context by adapting it for document…

Information Retrieval · Computer Science 2010-01-07 Christopher M. De Vries , Shlomo Geva

Balanced k-Means and Min-Cut Clustering

Clustering is an effective technique in data mining to generate groups that are the matter of interest. Among various clustering approaches, the family of k-means algorithms and min-cut algorithms gain most popularity due to their…

Machine Learning · Computer Science 2014-11-25 Xiaojun Chang , Feiping Nie , Zhigang Ma , Yi Yang

Automated Document Indexing via Intelligent Hierarchical Clustering: A Novel Approach

With the rising quantity of textual data available in electronic format, the need to organize it become a highly challenging task. In the present paper, we explore a document organization framework that exploits an intelligent hierarchical…

Information Retrieval · Computer Science 2015-04-02 Rajendra Kumar Roul , Shubham Rohan Asthana , Sanjay Kumar Sahay

An efficient $k$-means-type algorithm for clustering datasets with incomplete records

The $k$-means algorithm is arguably the most popular nonparametric clustering method but cannot generally be applied to datasets with incomplete records. The usual practice then is to either impute missing values under an assumed…

Machine Learning · Statistics 2018-09-11 Andrew Lithio , Ranjan Maitra

How to Use K-means for Big Data Clustering?

K-means plays a vital role in data mining and is the simplest and most widely used algorithm under the Euclidean Minimum Sum-of-Squares Clustering (MSSC) model. However, its performance drastically drops when applied to vast amounts of…

Machine Learning · Computer Science 2023-11-27 Rustam Mussabayev , Nenad Mladenovic , Bassem Jarboui , Ravil Mussabayev

Ontology Based Document Clustering Using MapReduce

Nowadays, document clustering is considered as a data intensive task due to the dramatic, fast increase in the number of available documents. Nevertheless, the features that represent those documents are also too large. The most common…

Databases · Computer Science 2015-05-13 Abdelrahman Elsayed , Hoda M. O. Mokhtar , Osama Ismail

Estimating the Effective Topics of Articles and journals Abstract Using LDA And K-Means Clustering Algorithm

Analyzing journals and articles abstract text or documents using topic modelling and text clustering has become a modern solution for the increasing number of text documents. Topic modelling and text clustering are both intensely involved…

Information Retrieval · Computer Science 2025-08-25 Shadikur Rahman , Umme Ayman Koana , Aras M. Ismael , Karmand Hussein Abdalla

An Analytical Study on Behavior of Clusters Using K Means, EM and K* Means Algorithm

Clustering is an unsupervised learning method that constitutes a cornerstone of an intelligent data analysis process. It is used for the exploration of inter-relationships among a collection of patterns, by organizing them into homogeneous…

Machine Learning · Computer Science 2010-04-13 G. Nathiya , S. C. Punitha , M. Punithavalli

Document Clustering based on Topic Maps

Importance of document clustering is now widely acknowledged by researchers for better management, smart navigation, efficient filtering, and concise summarization of large collection of documents like World Wide Web (WWW). The next…

Information Retrieval · Computer Science 2011-12-30 Muhammad Rafi , M. Shahid Shaikh , Amir Farooq

Experimental Estimation of Number of Clusters Based on Cluster Quality

Text Clustering is a text mining technique which divides the given set of text documents into significant clusters. It is used for organizing a huge number of text documents into a well-organized form. In the majority of the clustering…

Information Retrieval · Computer Science 2015-03-12 G. Hannah Grace , Kalyani Desikan

Text Mining using Nonnegative Matrix Factorization and Latent Semantic Analysis

Text clustering is arguably one of the most important topics in modern data mining. Nevertheless, text data require tokenization which usually yields a very large and highly sparse term-document matrix, which is usually difficult to process…

Machine Learning · Computer Science 2020-02-25 Ali Hassani , Amir Iranmanesh , Najme Mansouri

Transformed K-means Clustering

In this work we propose a clustering framework based on the paradigm of transform learning. In simple terms the representation from transform learning is used for K-means clustering; however, the problem is not solved in such a na\"ive…

Machine Learning · Computer Science 2021-11-30 Anurag Goel , Angshul Majumdar

Clustering of Big Data with Mixed Features

Clustering large, mixed data is a central problem in data mining. Many approaches adopt the idea of k-means, and hence are sensitive to initialisation, detect only spherical clusters, and require a priori the unknown number of clusters. We…

Machine Learning · Statistics 2020-11-13 Joshua Tobin , Mimi Zhang