Related papers: Reliable Distributed Clustering with Redundant Dat…

Variance-based Clustering Technique for Distributed Data Mining Applications

Nowadays, huge amounts of data are naturally collected in distributed sites due to different facts and moving these data through the network for extracting useful knowledge is almost unfeasible for either technical reasons or policies.…

Databases · Computer Science 2017-03-30 Lamine M. Aouad , Nhien-An Le-Khac , Tahar Kechadi

Scalable Density-Based Distributed Clustering

Clustering has become an increasingly important task in analysing huge amounts of data. Traditional applications require that all data has to be located at the site where it is scrutinized. Nowadays, large amounts of heterogeneous, complex…

Databases · Computer Science 2014-09-24 Eshref Januzaj , Hans-Peter Kriegel , Martin Pfeifle

On a Distributed Approach for Density-based Clustering

Efficient extraction of useful knowledge from these data is still a challenge, mainly when the data is distributed, heterogeneous and of different quality depending on its corresponding local infrastructure. To reduce the overhead cost,…

Databases · Computer Science 2017-04-17 Nhien-An Le-Khac , M-Tahar Kechadi

Data Driven Resource Allocation for Distributed Learning

In distributed machine learning, data is dispatched to multiple machines for processing. Motivated by the fact that similar data points often belong to the same or similar classes, and more generally, classification rules of high accuracy…

Machine Learning · Computer Science 2016-12-16 Travis Dick , Mu Li , Venkata Krishna Pillutla , Colin White , Maria Florina Balcan , Alex Smola

Distributed Clustering Algorithm for Spatial Data Mining

Distributed data mining techniques and mainly distributed clustering are widely used in the last decade because they deal with very large and heterogeneous datasets which cannot be gathered centrally. Current distributed clustering…

Databases · Computer Science 2018-02-02 Malika Bendechache , M-Tahar Kechadi

Writing summary for the state-of-the-art methods for big data clustering in distributed environment

Big Data processing systems handle huge unstructured and structured data to store, process, and analyze through cluster analysis which helps in identifying unseen patterns to find the relationships between them. Clustering analysis over the…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-11-11 Dipesh Gyawali

Modern hierarchical, agglomerative clustering algorithms

This paper presents algorithms for hierarchical, agglomerative clustering which perform most efficiently in the general-purpose setup that is given in modern standard software. Requirements are: (1) the input data is given by pairwise…

Machine Learning · Statistics 2011-09-13 Daniel Müllner

Efficient Large Scale Clustering based on Data Partitioning

Clustering techniques are very attractive for extracting and identifying patterns in datasets. However, their application to very large spatial datasets presents numerous challenges such as high-dimensionality data, heterogeneity, and high…

Databases · Computer Science 2018-02-27 Malika Bendechache , Nhien-An Le-Khac , M-Tahar Kechadi

Revisiting Large Scale Distributed Machine Learning

Nowadays, with the widespread of smartphones and other portable gadgets equipped with a variety of sensors, data is ubiquitous available and the focus of machine learning has shifted from being able to infer from small training samples to…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-07-07 Radu Cristian Ionescu

Distributed Spatial Data Clustering as a New Approach for Big Data Analysis

In this paper we propose a new approach for Big Data mining and analysis. This new approach works well on distributed datasets and deals with data clustering task of the analysis. The approach consists of two main phases, the first phase…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-03-05 Malika Bendechache , Nhien-An Le-Khac , M-Tahar Kechadi

A Survey From Distributed Machine Learning to Distributed Deep Learning

Artificial intelligence has made remarkable progress in handling complex tasks, thanks to advances in hardware acceleration and machine learning algorithms. However, to acquire more accurate outcomes and solve more complex issues,…

Machine Learning · Computer Science 2023-09-12 Mohammad Dehghani , Zahra Yazdanparast

Fast communication-efficient spectral clustering over distributed data

The last decades have seen a surge of interests in distributed computing thanks to advances in clustered computing and big data technology. Existing distributed algorithms typically assume {\it all the data are already in one place}, and…

Machine Learning · Computer Science 2019-05-07 Donghui Yan , Yingjie Wang , Jin Wang , Guodong Wu , Honggang Wang

Scalable Clustering: Large Scale Unsupervised Learning of Gaussian Mixture Models with Outliers

Clustering is a widely used technique with a long and rich history in a variety of areas. However, most existing algorithms do not scale well to large datasets, or are missing theoretical guarantees of convergence. This paper introduces a…

Machine Learning · Statistics 2024-10-16 Yijia Zhou , Kyle A. Gallivan , Adrian Barbu

Distributed Graph Clustering by Load Balancing

Graph clustering is a fundamental computational problem with a number of applications in algorithm design, machine learning, data mining, and analysis of social networks. Over the past decades, researchers have proposed a number of…

Data Structures and Algorithms · Computer Science 2019-04-12 He Sun , Luca Zanetti

A New Family of Feasible Methods for Distributed Resource Allocation

Distributed resource allocation is a central task in network systems such as smart grids, water distribution networks, and urban transportation systems. When solving such problems in practice it is often important to have nonasymptotic…

Optimization and Control · Mathematics 2021-03-30 Xuyang Wu , Sindri Magnusson , Mikael Johansson

Distributed Multi Class SVM for Large Data Sets

Data mining algorithms are originally designed by assuming the data is available at one centralized site.These algorithms also assume that the whole data is fit into main memory while running the algorithm. But in today's scenario the data…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-12-03 Aruna Govada , Bhavul Gauri , S. K. Sahay

Diffusion LMS for clustered multitask networks

Recent research works on distributed adaptive networks have intensively studied the case where the nodes estimate a common parameter vector collaboratively. However, there are many applications that are multitask-oriented in the sense that…

Systems and Control · Computer Science 2013-11-04 Jie Chen , Cédric Richard , Ali Sayed

Reliable Data Storage in Distributed Hash Tables

Distributed Hash Tables offer a resilient lookup service for unstable distributed environments. Resilient data storage, however, requires additional data replication and maintenance algorithms. These algorithms can have an impact on both…

Distributed, Parallel, and Cluster Computing · Computer Science 2007-05-23 Matthew Leslie

A Distributed Clustering Algorithm for Dynamic Networks

We propose an algorithm that builds and maintains clusters over a network subject to mobility. This algorithm is fully decentralized and makes all the different clusters grow concurrently. The algorithm uses circulating tokens that collect…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-11-15 Thibault Bernard , Alain Bui , Laurence Pilard , Devan Sohier

Leachable Component Clustering

Clustering attempts to partition data instances into several distinctive groups, while the similarities among data belonging to the common partition can be principally reserved. Furthermore, incomplete data frequently occurs in many…

Machine Learning · Computer Science 2022-08-30 Miao Cheng , Xinge You