Related papers: Probabilistic Partitive Partitioning (PPP)

Clustering of Big Data with Mixed Features

Clustering large, mixed data is a central problem in data mining. Many approaches adopt the idea of k-means, and hence are sensitive to initialisation, detect only spherical clusters, and require a priori the unknown number of clusters. We…

Machine Learning · Statistics 2020-11-13 Joshua Tobin , Mimi Zhang

A sampling-based approach for efficient clustering in large datasets

We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our…

Machine Learning · Computer Science 2022-03-30 Georgios Exarchakis , Omar Oubari , Gregor Lenz

How to Use K-means for Big Data Clustering?

K-means plays a vital role in data mining and is the simplest and most widely used algorithm under the Euclidean Minimum Sum-of-Squares Clustering (MSSC) model. However, its performance drastically drops when applied to vast amounts of…

Machine Learning · Computer Science 2023-11-27 Rustam Mussabayev , Nenad Mladenovic , Bassem Jarboui , Ravil Mussabayev

An efficient K -means clustering algorithm for massive data

The analysis of continously larger datasets is a task of major importance in a wide variety of scientific fields. In this sense, cluster analysis algorithms are a key element of exploratory data analysis, due to their easiness in the…

Machine Learning · Statistics 2018-01-10 Marco Capó , Aritz Pérez , Jose A. Lozano

Partitioning Clustering algorithms for handling numerical and categorical data: a review

Clustering is widely used in different field such as biology, psychology, and economics. Most traditional clustering algorithms are limited to handling datasets that contain either numeric or categorical attributes. However, datasets with…

Databases · Computer Science 2019-07-03 Trupti M. Kodinariya Dr. Prashant R. Makwana

Efficient Large Scale Clustering based on Data Partitioning

Clustering techniques are very attractive for extracting and identifying patterns in datasets. However, their application to very large spatial datasets presents numerous challenges such as high-dimensionality data, heterogeneity, and high…

Databases · Computer Science 2018-02-27 Malika Bendechache , Nhien-An Le-Khac , M-Tahar Kechadi

Scalable Clustering: Large Scale Unsupervised Learning of Gaussian Mixture Models with Outliers

Clustering is a widely used technique with a long and rich history in a variety of areas. However, most existing algorithms do not scale well to large datasets, or are missing theoretical guarantees of convergence. This paper introduces a…

Machine Learning · Statistics 2024-10-16 Yijia Zhou , Kyle A. Gallivan , Adrian Barbu

A Mathematical Theory for Clustering in Metric Spaces

Clustering is one of the most fundamental problems in data analysis and it has been studied extensively in the literature. Though many clustering algorithms have been proposed, clustering theories that justify the use of these clustering…

Machine Learning · Computer Science 2016-02-22 Cheng-Shang Chang , Wanjiun Liao , Yu-Sheng Chen , Li-Heng Liou

A Unifying Family of Data-Adaptive Partitioning Algorithms

Clustering algorithms remain valuable tools for grouping and summarizing the most important aspects of data. Example areas where this is the case include image segmentation, dimension reduction, signals analysis, model order reduction,…

Numerical Analysis · Mathematics 2024-12-24 Guy B. Oldaker , Maria Emelianenko

Memory Enriched Big Bang Big Crunch Optimization Algorithm for Data Clustering

Cluster analysis plays an important role in decision making process for many knowledge-based systems. There exist a wide variety of different approaches for clustering applications including the heuristic techniques, probabilistic models,…

Artificial Intelligence · Computer Science 2017-03-09 Kayvan Bijari , Hadi Zare , Hadi Veisi , Hossein Bobarshad

Differentially-Private Clustering of Easy Instances

Clustering is a fundamental problem in data analysis. In differentially private clustering, the goal is to identify $k$ cluster centers without disclosing information on individual data points. Despite significant research progress, the…

Machine Learning · Computer Science 2021-12-30 Edith Cohen , Haim Kaplan , Yishay Mansour , Uri Stemmer , Eliad Tsfadia

Neural Capacitated Clustering

Recent work on deep clustering has found new promising methods also for constrained clustering problems. Their typically pairwise constraints often can be used to guide the partitioning of the data. Many problems however, feature…

Machine Learning · Computer Science 2023-05-22 Jonas K. Falkner , Lars Schmidt-Thieme

A Tutorial on Discriminative Clustering and Mutual Information

To cluster data is to separate samples into distinctive groups that should ideally have some cohesive properties. Today, numerous clustering algorithms exist, and their differences lie essentially in what can be perceived as ``cohesive…

Machine Learning · Statistics 2025-05-08 Louis Ohl , Pierre-Alexandre Mattei , Frédéric Precioso

Clustering based on Mixtures of Sparse Gaussian Processes

Creating low dimensional representations of a high dimensional data set is an important component in many machine learning applications. How to cluster data using their low dimensional embedded space is still a challenging problem in…

Machine Learning · Computer Science 2023-03-27 Zahra Moslehi , Abdolreza Mirzaei , Mehran Safayani

Clustering by Constructing Hyper-Planes

As a kind of basic machine learning method, clustering algorithms group data points into different categories based on their similarity or distribution. We present a clustering algorithm by finding hyper-planes to distinguish the data…

Computer Vision and Pattern Recognition · Computer Science 2020-04-28 Luhong Diao , Jinying Gao1 , Manman Deng

Comparison three methods of clustering: k-means, spectral clustering and hierarchical clustering

Comparison of three kind of the clustering and find cost function and loss function and calculate them. Error rate of the clustering methods and how to calculate the error percentage always be one on the important factor for evaluating the…

Machine Learning · Computer Science 2014-11-14 Kamran Kowsari

Clustering Binary Data by Application of Combinatorial Optimization Heuristics

We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters. Five new and original methods are introduced, using neighborhoods and population behavior combinatorial optimization…

Machine Learning · Statistics 2020-01-08 Javier Trejos-Zelaya , Luis Eduardo Amaya-Briceño , Alejandra Jiménez-Romero , Alex Murillo-Fernández , Eduardo Piza-Volio , Mario Villalobos-Arias

Balanced k-Means and Min-Cut Clustering

Clustering is an effective technique in data mining to generate groups that are the matter of interest. Among various clustering approaches, the family of k-means algorithms and min-cut algorithms gain most popularity due to their…

Machine Learning · Computer Science 2014-11-25 Xiaojun Chang , Feiping Nie , Zhigang Ma , Yi Yang

Tk-merge: Computationally Efficient Robust Clustering Under General Assumptions

We address general-shaped clustering problems under very weak parametric assumptions with a two-step hybrid robust clustering algorithm based on trimmed k-means and hierarchical agglomeration. The algorithm has low computational complexity…

Methodology · Statistics 2022-01-19 Luca Insolia , Domenico Perrotta

Neural Clustering Processes

Probabilistic clustering models (or equivalently, mixture models) are basic building blocks in countless statistical models and involve latent random variables over discrete spaces. For these models, posterior inference methods can be…

Machine Learning · Statistics 2020-06-24 Ari Pakman , Yueqi Wang , Catalin Mitelut , JinHyung Lee , Liam Paninski