English
Related papers

Related papers: A Practical Algorithm for Distributed Clustering a…

200 papers

Clustering is a widely used technique with a long and rich history in a variety of areas. However, most existing algorithms do not scale well to large datasets, or are missing theoretical guarantees of convergence. This paper introduces a…

Machine Learning · Statistics 2024-10-16 Yijia Zhou , Kyle A. Gallivan , Adrian Barbu

Being robust to the presence of outliers is crucial for applying clustering algorithms in practice. In the $\textit{robust $k$-Means}$ problem (i.e., $k$-Means with outliers), the goal is to remove $z$ outliers and minimize the $k$-Means…

Machine Learning · Computer Science 2026-05-11 Tianle Jiang , Yufa Zhou

Clustering has many important applications in computer science, but real-world datasets often contain outliers. Moreover, the presence of outliers can make the clustering problems to be much more challenging. To reduce the complexities,…

Data Structures and Algorithms · Computer Science 2020-05-04 Hu Ding , Jiawei Huang , Haikuo Yu

Outliers are the points which are different from or inconsistent with the rest of the data. They can be novel, new, abnormal, unusual or noisy information. Outliers are sometimes more interesting than the majority of the data. The main…

Computer Vision and Pattern Recognition · Computer Science 2014-06-20 Singh Vijendra , Pathak Shivani

The outlier detection problem in some cases is similar to the classification problem. For example, the main concern of clustering-based outlier detection algorithms is to find clusters and outliers, which are often regarded as noise that…

Machine Learning · Computer Science 2014-05-25 M. H. Marghny , Ahmed I. Taloba

Clustering problems are well-studied in a variety of fields such as data science, operations research, and computer science. Such problems include variants of centre location problems, $k$-median, and $k$-means to name a few. In some cases,…

Data Structures and Algorithms · Computer Science 2017-07-17 Zachary Friggstad , Kamyar Khodamoradi , Mohsen Rezapour , Mohammad R. Salavatipour

Metric $k$-center clustering is a fundamental unsupervised learning primitive. Although widely used, this primitive is heavily affected by noise in the data, so that a more sensible variant seeks for the best solution that disregards a…

Machine Learning · Computer Science 2022-02-28 Paolo Pellizzoni , Andrea Pietracaprina , Geppino Pucci

Real-world datasets often contain outliers, and the presence of outliers can make the clustering problems to be much more challenging. In this paper, we propose a simple uniform sampling framework for solving three representative…

Machine Learning · Computer Science 2023-10-04 Jiawei Huang , Wenjie Liu , Hu Ding

Outliers are ubiquitous in modern data sets. Distance-based techniques are a popular non-parametric approach to outlier detection as they require no prior assumptions on the data generating distribution and are simple to implement. Scaling…

Machine Learning · Statistics 2016-05-04 Mario Lucic , Olivier Bachem , Andreas Krause

Outlier detection in data streams has gained wide importance presently due to the increasing cases of fraud in various applications of data streams. The techniques for outlier detection have been divided into either statistics based,…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-03-25 Parneeta Dhaliwal , M. P. S. Bhatia , Priti Bansal

Notwithstanding the popularity of conventional clustering algorithms such as K-means and probabilistic clustering, their clustering results are sensitive to the presence of outliers in the data. Even a few outliers can compromise the…

Machine Learning · Statistics 2015-05-27 Pedro A. Forero , Vassilis Kekatos , Georgios B. Giannakis

In this paper, we consider the $k$-center/median/means clustering with outliers problems (or the $(k, z)$-center/median/means problems) in the distributed setting. Most previous distributed algorithms have their communication costs linearly…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-11-30 Xiangyu Guo , Shi Li

Clustering, or unsupervised classification, is a task often plagued by outliers. Yet there is a paucity of work on handling outliers in clustering. Outlier identification algorithms tend to fall into three broad categories: outlier…

Methodology · Statistics 2024-05-31 Katharine M. Clark , Paul D. McNicholas

Clustering and outlier detection are two important tasks in data mining. Outliers frequently interfere with clustering algorithms to determine the similarity between objects, resulting in unreliable clustering results. Currently, only a few…

Machine Learning · Computer Science 2024-12-10 Qi Li , Shuliang Wang

Often the challenge associated with tasks like fraud and spam detection[1] is the lack of all likely patterns needed to train suitable supervised learning models. In order to overcome this limitation, such tasks are attempted as outlier or…

Machine Learning · Computer Science 2018-08-22 Utkarsh Porwal , Smruthi Mukund

In this paper, we consider two types of robust models of the $k$-median/$k$-means problems: the outlier-version ($k$-MedO/$k$-MeaO) and the penalty-version ($k$-MedP/$k$-MeaP), in which we can mark some points as outliers and discard them.…

Data Structures and Algorithms · Computer Science 2021-01-01 Yishui Wang , Rolf H. Möhring , Chenchen Wu , Dachuan Xu , Dongmei Zhang

We propose a new assumption in outlier detection: Normal data instances are commonly located in the area that there is hardly any fluctuation on data density, while outliers are often appeared in the area that there is violent fluctuation…

Machine Learning · Computer Science 2020-06-09 Ding Liu , Hui Li

This paper considers $k$-means clustering in the presence of noise. It is known that $k$-means clustering is highly sensitive to noise, and thus noise should be removed to obtain a quality solution. A popular formulation of this problem is…

Data Structures and Algorithms · Computer Science 2020-04-14 Sungjin Im , Mahshid Montazer Qaem , Benjamin Moseley , Xiaorui Sun , Rudy Zhou

We introduce and study the $k$-center clustering problem with set outliers, a natural and practical generalization of the classical $k$-center clustering with outliers. Instead of removing individual data points, our model allows discarding…

Data Structures and Algorithms · Computer Science 2025-12-23 Vaishali Surianarayanan , Neeraj Kumar , Stavros Sintos

Outlier detection is an important problem occurring in a wide range of areas. Outliers are the outcome of fraudulent behaviour, mechanical faults, human error, or simply natural deviations. Many data mining applications perform outlier…

Machine Learning · Computer Science 2025-10-28 Juan A. Lara , David Lizcano , Víctor Rampérez , Javier Soriano
‹ Prev 1 2 3 10 Next ›