Related papers: Scalable Density-Based Distributed Clustering

On a Distributed Approach for Density-based Clustering

Efficient extraction of useful knowledge from these data is still a challenge, mainly when the data is distributed, heterogeneous and of different quality depending on its corresponding local infrastructure. To reduce the overhead cost,…

Databases · Computer Science 2017-04-17 Nhien-An Le-Khac , M-Tahar Kechadi

Scalable Varied-Density Clustering via Graph Propagation

We propose a novel perspective on varied-density clustering for high-dimensional data by framing it as a label propagation process in neighborhood graphs that adapt to local density variations. Our method formally connects density-based…

Machine Learning · Computer Science 2025-08-06 Ninh Pham , Yingtao Zheng , Hugo Phibbs

Efficient Large Scale Clustering based on Data Partitioning

Clustering techniques are very attractive for extracting and identifying patterns in datasets. However, their application to very large spatial datasets presents numerous challenges such as high-dimensionality data, heterogeneity, and high…

Databases · Computer Science 2018-02-27 Malika Bendechache , Nhien-An Le-Khac , M-Tahar Kechadi

Reliable Distributed Clustering with Redundant Data Assignment

In this paper, we present distributed generalized clustering algorithms that can handle large scale data across multiple machines in spite of straggling or unreliable machines. We propose a novel data assignment scheme that enables us to…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-17 Venkata Gandikota , Arya Mazumdar , Ankit Singh Rawat

Distributed Spatial Data Clustering as a New Approach for Big Data Analysis

In this paper we propose a new approach for Big Data mining and analysis. This new approach works well on distributed datasets and deals with data clustering task of the analysis. The approach consists of two main phases, the first phase…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-03-05 Malika Bendechache , Nhien-An Le-Khac , M-Tahar Kechadi

Distributed Lance-William Clustering Algorithm

One important tool is the optimal clustering of data into useful categories. Dividing similar objects into a smaller number of clusters is of importance in many applications. These include search engines, monitoring of academic performance,…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-09-21 Gavriel Yarmish , Philip Listowsky , Simon Dexter

Variance-based Clustering Technique for Distributed Data Mining Applications

Nowadays, huge amounts of data are naturally collected in distributed sites due to different facts and moving these data through the network for extracting useful knowledge is almost unfeasible for either technical reasons or policies.…

Databases · Computer Science 2017-03-30 Lamine M. Aouad , Nhien-An Le-Khac , Tahar Kechadi

On Distributed Algorithms for Cost-Efficient Data Center Placement in Cloud Computing

The increasing popularity of cloud computing has resulted in a proliferation of data centers. Effective placement of data centers improves network performance and minimizes clients' perceived latency. The problem of determining the optimal…

Networking and Internet Architecture · Computer Science 2018-02-06 Wuqiong Luo , Wee Peng Tay , Peng Sun , Yonggang Wen

Revisiting Large Scale Distributed Machine Learning

Nowadays, with the widespread of smartphones and other portable gadgets equipped with a variety of sensors, data is ubiquitous available and the focus of machine learning has shifted from being able to infer from small training samples to…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-07-07 Radu Cristian Ionescu

SACA: Selective Attention-Based Clustering Algorithm

Clustering algorithms are fundamental tools across many fields, with density-based methods offering particular advantages in identifying arbitrarily shaped clusters and handling noise. However, their effectiveness is often limited by the…

Machine Learning · Computer Science 2025-12-01 Meysam Shirdel Bilehsavar , Razieh Ghaedi , Samira Seyed Taheri , Xinqi Fan , Christian O'Reilly

A Distributed Clustering Algorithm for Dynamic Networks

We propose an algorithm that builds and maintains clusters over a network subject to mobility. This algorithm is fully decentralized and makes all the different clusters grow concurrently. The algorithm uses circulating tokens that collect…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-11-15 Thibault Bernard , Alain Bui , Laurence Pilard , Devan Sohier

Communication-Efficient and Exact Clustering Distributed Streaming Data

A widely used approach to clustering a single data stream is the two-phased approach in which the online phase creates and maintains micro-clusters while the off-line phase generates the macro-clustering from the micro-clusters. We use this…

Databases · Computer Science 2012-09-20 Dang-Hoan Tran

Data Accuracy Model for Distributed Clustering Algorithm based on Spatial Data Correlation in Wireless Sensor Networks

Objective: The main objective of this paper is to construct a distributed clustering algorithm based upon spatial data correlation among sensor nodes and perform data accuracy for each distributed cluster at their respective cluster head…

Networking and Internet Architecture · Computer Science 2011-08-15 Jyotirmoy Karjee , H. S Jamadagni

Distributed Resource Selection for Self-Organising Cloud-Edge Systems

This paper presents a distributed resource selection mechanism for diverse cloud-edge environments, enabling dynamic and context-aware allocation of resources to meet the demands of complex distributed applications. By distributing the…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-10 Quentin Renau , Amjad Ullah , Emma Hart

Hierarchical Aggregation Approach for Distributed clustering of spatial datasets

In this paper, we present a new approach of distributed clustering for spatial datasets, based on an innovative and efficient aggregation technique. This distributed approach consists of two phases: 1) local clustering phase, where each…

Databases · Computer Science 2018-02-05 Malika Bendechache , Nhien-An Le-Khac , M-Tahar Kechadi

Scalable Hierarchical Agglomerative Clustering

The applicability of agglomerative clustering, for inferring both hierarchical and flat clustering, is limited by its scalability. Existing scalable hierarchical clustering methods sacrifice quality for speed and often lead to over-merging…

Machine Learning · Computer Science 2021-10-01 Nicholas Monath , Avinava Dubey , Guru Guruganesh , Manzil Zaheer , Amr Ahmed , Andrew McCallum , Gokhan Mergen , Marc Najork , Mert Terzihan , Bryon Tjanaka , Yuan Wang , Yuchen Wu

Scalable Co-Clustering for Large-Scale Data through Dynamic Partitioning and Hierarchical Merging

Co-clustering simultaneously clusters rows and columns, revealing more fine-grained groups. However, existing co-clustering methods suffer from poor scalability and cannot handle large-scale data. This paper presents a novel and scalable…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-20 Zihan Wu , Zhaoke Huang , Hong Yan

Fast communication-efficient spectral clustering over distributed data

The last decades have seen a surge of interests in distributed computing thanks to advances in clustered computing and big data technology. Existing distributed algorithms typically assume {\it all the data are already in one place}, and…

Machine Learning · Computer Science 2019-05-07 Donghui Yan , Yingjie Wang , Jin Wang , Guodong Wu , Honggang Wang

Fully adaptive density-based clustering

The clusters of a distribution are often defined by the connected components of a density level set. However, this definition depends on the user-specified level. We address this issue by proposing a simple, generic algorithm, which uses an…

Methodology · Statistics 2015-10-29 Ingo Steinwart

Hashing-Based Distributed Clustering for Massive High-Dimensional Data

Clustering analysis is of substantial significance for data mining. The properties of big data raise higher demand for more efficient and economical distributed clustering methods. However, existing distributed clustering methods mainly…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-07-03 Yifeng Xiao , Jiang Xue , Deyu Meng