Related papers: A clustering algorithm for multivariate data strea…

A Short Survey on Data Clustering Algorithms

With rapidly increasing data, clustering algorithms are important tools for data analytics in modern research. They have been successfully applied to a wide range of domains; for instance, bioinformatics, speech recognition, and financial…

Data Structures and Algorithms · Computer Science 2015-12-01 Ka-Chun Wong

An efficient K -means clustering algorithm for massive data

The analysis of continously larger datasets is a task of major importance in a wide variety of scientific fields. In this sense, cluster analysis algorithms are a key element of exploratory data analysis, due to their easiness in the…

Machine Learning · Statistics 2018-01-10 Marco Capó , Aritz Pérez , Jose A. Lozano

Clustering of Big Data with Mixed Features

Clustering large, mixed data is a central problem in data mining. Many approaches adopt the idea of k-means, and hence are sensitive to initialisation, detect only spherical clusters, and require a priori the unknown number of clusters. We…

Machine Learning · Statistics 2020-11-13 Joshua Tobin , Mimi Zhang

On the clustering of correlated random variables

In this work, the possibility of clustering correlated random variables was examined, both because of their mutual similarity and because of their similarity to the principal components. The k-means algorithm and spectral algorithms were…

Machine Learning · Computer Science 2019-09-10 Zenon Gniazdowski , Dawid Kaliszewski

Overview of streaming-data algorithms

Due to recent advances in data collection techniques, massive amounts of data are being collected at an extremely fast pace. Also, these data are potentially unbounded. Boundless streams of data collected from sensors, equipments, and other…

Databases · Computer Science 2012-03-12 T Soni Madhulatha

Clustering Categorical Data Streams

The data stream model has been defined for new classes of applications involving massive data being generated at a fast pace. Web click stream analysis and detection of network intrusions are two examples. Cluster analysis on data streams…

Databases · Computer Science 2007-05-23 Zengyou He , Xiaofei Xu , Shengchun Deng , Joshua Zhexue Huang

A sampling-based approach for efficient clustering in large datasets

We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our…

Machine Learning · Computer Science 2022-03-30 Georgios Exarchakis , Omar Oubari , Gregor Lenz

A hybrid clustering algorithm for data mining

Data clustering is a process of arranging similar data into groups. A clustering algorithm partitions a data set into several groups such that the similarity within a group is better than among groups. In this paper a hybrid clustering…

Databases · Computer Science 2012-05-25 Ravindra Jain

Data Stream Clustering: A Review

Number of connected devices is steadily increasing and these devices continuously generate data streams. Real-time processing of data streams is arousing interest despite many challenges. Clustering is one of the most suitable methods for…

Machine Learning · Computer Science 2020-07-22 Alaettin Zubaroğlu , Volkan Atalay

A Generic Framework for Fair Consensus Clustering in Streams

Consensus clustering seeks to combine multiple clusterings of the same dataset, potentially derived by considering various non-sensitive attributes by different agents in a multi-agent environment, into a single partitioning that best…

Machine Learning · Computer Science 2026-02-13 Diptarka Chakraborty , Kushagra Chatterjee , Debarati Das , Tien-Long Nguyen

Achieving Approximate Soft Clustering in Data Streams

In recent years, data streaming has gained prominence due to advances in technologies that enable many applications to generate continuous flows of data. This increases the need to develop algorithms that are able to efficiently process…

Data Structures and Algorithms · Computer Science 2015-03-20 Vaneet Aggarwal , Shankar Krishnan

Comparative Analysis of Optimization Strategies for K-means Clustering in Big Data Contexts: A Review

This paper presents a comparative analysis of different optimization techniques for the K-means algorithm in the context of big data. K-means is a widely used clustering algorithm, but it can suffer from scalability issues when dealing with…

Machine Learning · Computer Science 2024-05-21 Ravil Mussabayev , Rustam Mussabayev

Distributed Clustering Algorithm for Spatial Data Mining

Distributed data mining techniques and mainly distributed clustering are widely used in the last decade because they deal with very large and heterogeneous datasets which cannot be gathered centrally. Current distributed clustering…

Databases · Computer Science 2018-02-02 Malika Bendechache , M-Tahar Kechadi

Data Stream Clustering: Challenges and Issues

Very large databases are required to store massive amounts of data that are continuously inserted and queried. Analyzing huge data sets and extracting valuable pattern in many applications are interesting for researchers. We can identify…

Databases · Computer Science 2010-06-29 Madjid Khalilian , Norwati Mustapha

Semi-supervised clustering methods

Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning…

Methodology · Statistics 2014-07-11 Eric Bair

Online Clustering of Known and Emerging Malware Families

Malware attacks have become significantly more frequent and sophisticated in recent years. Therefore, malware detection and classification are critical components of information security. Due to the large amount of malware samples…

Cryptography and Security · Computer Science 2024-05-07 Olha Jurečková , Martin Jureček , Mark Stamp

How to Use K-means for Big Data Clustering?

K-means plays a vital role in data mining and is the simplest and most widely used algorithm under the Euclidean Minimum Sum-of-Squares Clustering (MSSC) model. However, its performance drastically drops when applied to vast amounts of…

Machine Learning · Computer Science 2023-11-27 Rustam Mussabayev , Nenad Mladenovic , Bassem Jarboui , Ravil Mussabayev

Correlation Clustering in Data Streams

Clustering is a fundamental tool for analyzing large data sets. A rich body of work has been devoted to designing data-stream algorithms for the relevant optimization problems such as $k$-center, $k$-median, and $k$-means. Such algorithms…

Data Structures and Algorithms · Computer Science 2018-12-06 Kook Jin Ahn , Graham Cormode , Sudipto Guha , Andrew McGregor , Anthony Wirth

Online Clustering by Penalized Weighted GMM

With the dawn of the Big Data era, data sets are growing rapidly. Data is streaming from everywhere - from cameras, mobile phones, cars, and other electronic devices. Clustering streaming data is a very challenging problem. Unlike the…

Machine Learning · Computer Science 2019-02-08 Shlomo Bugdary , Shay Maymon

A novel k-means clustering approach using two distance measures for Gaussian data

Clustering algorithms have long been the topic of research, representing the more popular side of unsupervised learning. Since clustering analysis is one of the best ways to find some clarity and structure within raw data, this paper…

Machine Learning · Computer Science 2025-11-25 Naitik Gada