Related papers: A Generic Distributed Clustering Framework for Mas…

Distributed Clustering based on Distributional Kernel

This paper introduces a new framework for clustering in a distributed network called Distributed Clustering based on Distributional Kernel (K) or KDC that produces the final clusters based on the similarity with respect to the distributions…

Machine Learning · Computer Science 2024-09-17 Hang Zhang , Yang Xu , Lei Gong , Ye Zhu , Kai Ming Ting

GBSK: Skeleton Clustering via Granular-ball Computing and Multi-Sampling for Large-Scale Data

To effectively handle clustering task for large-scale datasets, we propose a novel scalable skeleton clustering algorithm, namely GBSK, which leverages the granular-ball technique to capture the underlying structure of data. By…

Machine Learning · Computer Science 2025-09-30 Yewang Chen , Junfeng Li , Shuyin Xia , Qinghong Lai , Xinbo Gao , Guoyin Wang , Dongdong Cheng , Yi Liu , Yi Wang

How to Use K-means for Big Data Clustering?

K-means plays a vital role in data mining and is the simplest and most widely used algorithm under the Euclidean Minimum Sum-of-Squares Clustering (MSSC) model. However, its performance drastically drops when applied to vast amounts of…

Machine Learning · Computer Science 2023-11-27 Rustam Mussabayev , Nenad Mladenovic , Bassem Jarboui , Ravil Mussabayev

Generative Mixture of Networks

A generative model based on training deep architectures is proposed. The model consists of K networks that are trained together to learn the underlying distribution of a given data set. The process starts with dividing the input data into K…

Machine Learning · Computer Science 2017-02-14 Ershad Banijamali , Ali Ghodsi , Pascal Poupart

Multi-View Spectral Clustering for Graphs with Multiple View Structures

Despite the fundamental importance of clustering, to this day, much of the relevant research is still based on ambiguous foundations, leading to an unclear understanding of whether or how the various clustering methods are connected with…

Machine Learning · Computer Science 2025-01-29 Yorgos Tsitsikas , Evangelos E. Papalexakis

Distributed Clustering Algorithm for Spatial Data Mining

Distributed data mining techniques and mainly distributed clustering are widely used in the last decade because they deal with very large and heterogeneous datasets which cannot be gathered centrally. Current distributed clustering…

Databases · Computer Science 2018-02-02 Malika Bendechache , M-Tahar Kechadi

GBC: An Efficient and Adaptive Clustering Algorithm Based on Granular-Ball

Existing clustering methods are based on a single granularity of information, such as the distance and density of each data. This most fine-grained based approach is usually inefficient and susceptible to noise. Inspired by adaptive process…

Machine Learning · Computer Science 2023-03-03 Shuyin Xia , Jiang Xie , Guoyin Wang

Quegel: A General-Purpose Query-Centric Framework for Querying Big Graphs

Pioneered by Google's Pregel, many distributed systems have been developed for large-scale graph analytics. These systems expose the user-friendly "think like a vertex" programming interface to users, and exhibit good horizontal…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-01-26 Da Yan , James Cheng , M. Tamer Özsu , Fan Yang , Yi Lu , John C. S. Lui , Qizhen Zhang , Wilfred Ng

Distributed k-Means and k-Median Clustering on General Topologies

This paper provides new algorithms for distributed clustering for two popular center-based objectives, k-median and k-means. These algorithms have provable guarantees and improve communication complexity over existing approaches. Following…

Machine Learning · Computer Science 2020-01-28 Maria Florina Balcan , Steven Ehrlich , Yingyu Liang

CDC: A Simple Framework for Complex Data Clustering

In today's data-driven digital era, the amount as well as complexity, such as multi-view, non-Euclidean, and multi-relational, of the collected data are growing exponentially or even faster. Clustering, which unsupervisely extracts valid…

Machine Learning · Computer Science 2025-01-10 Zhao Kang , Xuanting Xie , Bingheng Li , Erlin Pan

An Initial Seed Selection Algorithm for K-means Clustering of Georeferenced Data to Improve Replicability of Cluster Assignments for Mapping Application

K-means is one of the most widely used clustering algorithms in various disciplines, especially for large datasets. However the method is known to be highly sensitive to initial seed selection of cluster centers. K-means++ has been proposed…

Machine Learning · Computer Science 2016-04-19 Fouad Khan

initKmix -- A Novel Initial Partition Generation Algorithm for Clustering Mixed Data using k-means-based Clustering

Mixed datasets consist of both numeric and categorical attributes. Various k-means-based clustering algorithms have been developed for these datasets. Generally, these algorithms use random partition as a starting point, which tends to…

Machine Learning · Computer Science 2020-07-24 Amir Ahmad , Shehroz S. Khan

Clustering dynamics on graphs: from spectral clustering to mean shift through Fokker-Planck interpolation

In this work we build a unifying framework to interpolate between density-driven and geometry-based algorithms for data clustering, and specifically, to connect the mean shift algorithm with spectral clustering at discrete and continuum…

Machine Learning · Statistics 2021-10-22 Katy Craig , Nicolás García Trillos , Dejan Slepčev

Generative Kernel Spectral Clustering

Modern clustering approaches often trade interpretability for performance, particularly in deep learning-based methods. We present Generative Kernel Spectral Clustering (GenKSC), a novel model combining kernel spectral clustering with…

Machine Learning · Computer Science 2025-04-25 David Winant , Sonny Achten , Johan A. K. Suykens

Fast Distributed k-Means with a Small Number of Rounds

We propose a new algorithm for k-means clustering in a distributed setting, where the data is distributed across many machines, and a coordinator communicates with these machines to calculate the output clustering. Our algorithm guarantees…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-14 Tom Hess , Ron Visbord , Sivan Sabato

Clustering of Big Data with Mixed Features

Clustering large, mixed data is a central problem in data mining. Many approaches adopt the idea of k-means, and hence are sensitive to initialisation, detect only spherical clusters, and require a priori the unknown number of clusters. We…

Machine Learning · Statistics 2020-11-13 Joshua Tobin , Mimi Zhang

Hulk: Graph Neural Networks for Optimizing Regionally Distributed Computing Systems

Large deep learning models have shown great potential for delivering exceptional results in various applications. However, the training process can be incredibly challenging due to the models' vast parameter sizes, often consisting of…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-04-14 Zhengqing Yuan , Huiwen Xue , Chao Zhang , Yongming Liu

Clustering by the Probability Distributions from Extreme Value Theory

Clustering is an essential task to unsupervised learning. It tries to automatically separate instances into coherent subsets. As one of the most well-known clustering algorithms, k-means assigns sample points at the boundary to a unique…

Machine Learning · Computer Science 2022-02-22 Sixiao Zheng , Ke Fan , Yanxi Hou , Jianfeng Feng , Yanwei Fu

Machine Learning for Genomic Data

This report explores the application of machine learning techniques on short timeseries gene expression data. Although standard machine learning algorithms work well on longer time-series', they often fail to find meaningful insights from…

Genomics · Quantitative Biology 2021-11-17 Akankshita Dash

Dalek: An Unconventional and Energy-Aware Heterogeneous Cluster

Dalek is an experimental compute cluster designed to evaluate the performance of heterogeneous, consumer-grade hardware for software design, prototyping, and algorithm development. In contrast to traditional computing centers that rely on…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-15 Adrien Cassagne , Noé Amiot , Manuel Bouyer