English
Related papers

Related papers: Robust Clustering Using Outlier-Sparsity Regulariz…

200 papers

Being robust to the presence of outliers is crucial for applying clustering algorithms in practice. In the $\textit{robust $k$-Means}$ problem (i.e., $k$-Means with outliers), the goal is to remove $z$ outliers and minimize the $k$-Means…

Machine Learning · Computer Science 2026-05-11 Tianle Jiang , Yufa Zhou

Outliers are the points which are different from or inconsistent with the rest of the data. They can be novel, new, abnormal, unusual or noisy information. Outliers are sometimes more interesting than the majority of the data. The main…

Computer Vision and Pattern Recognition · Computer Science 2014-06-20 Singh Vijendra , Pathak Shivani

In many situations where the interest lies in identifying clusters one might expect that not all available variables carry information about these groups. Furthermore, data quality (e.g. outliers or missing entries) might present a serious…

Machine Learning · Statistics 2012-01-31 Yumi Kondo , Matias Salibian-Barrera , Ruben Zamar

We consider the problem of clustering datasets in the presence of arbitrary outliers. Traditional clustering algorithms such as k-means and spectral clustering are known to perform poorly for datasets contaminated with even a small number…

Machine Learning · Statistics 2021-02-02 Prateek R. Srivastava , Purnamrita Sarkar , Grani A. Hanasusanto

Clustering is a widely used technique with a long and rich history in a variety of areas. However, most existing algorithms do not scale well to large datasets, or are missing theoretical guarantees of convergence. This paper introduces a…

Machine Learning · Statistics 2024-10-16 Yijia Zhou , Kyle A. Gallivan , Adrian Barbu

Clustering and outlier detection are two important tasks in data mining. Outliers frequently interfere with clustering algorithms to determine the similarity between objects, resulting in unreliable clustering results. Currently, only a few…

Machine Learning · Computer Science 2024-12-10 Qi Li , Shuliang Wang

As in other estimation scenarios, likelihood based estimation in the normal mixture set-up is highly non-robust against model misspecification and presence of outliers (apart from being an ill-posed optimization problem). A robust…

Methodology · Statistics 2023-12-20 Soumya Chakraborty , Ayanendranath Basu , Abhik Ghosh

Clustering is a fundamental tool in unsupervised learning, used to group objects by distinguishing between similar and dissimilar features of a given data set. One of the most common clustering algorithms is k-means. Unfortunately, when…

Machine Learning · Statistics 2021-08-17 Olga Dorabiala , J. Nathan Kutz , Aleksandr Aravkin

We study the classic $k$-means/median clustering, which are fundamental problems in unsupervised learning, in the setting where data are partitioned across multiple sites, and where we are allowed to discard a small portion of the data by…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-10-12 Jiecao Chen , Erfan Sadeqi Azer , Qin Zhang

We propose a new assumption in outlier detection: Normal data instances are commonly located in the area that there is hardly any fluctuation on data density, while outliers are often appeared in the area that there is violent fluctuation…

Machine Learning · Computer Science 2020-06-09 Ding Liu , Hui Li

In many machine learning tasks, a common approach for dealing with large-scale data is to build a small summary, {\em e.g.,} coreset, that can efficiently represent the original input. However, real-world datasets usually contain outliers…

Machine Learning · Computer Science 2022-01-24 Zixiu Wang , Yiwen Guo , Hu Ding

K-means clustering is a workhorse of unsupervised learning, but it is notoriously brittle to outliers, distribution shifts, and limited sample sizes. Viewing k-means as Lloyd--Max quantization of the empirical distribution, we develop a…

Machine Learning · Computer Science 2026-04-14 Vikrant Malik , Taylan Kargin , Babak Hassibi

The problem of clustering noisy and incompletely observed high-dimensional data points into a union of low-dimensional subspaces and a set of outliers is considered. The number of subspaces, their dimensions, and their orientations are…

Machine Learning · Statistics 2015-08-24 Reinhard Heckel , Helmut Bölcskei

In real world, our datasets often contain outliers. Moreover, the outliers can seriously affect the final machine learning result. Most existing algorithms for handling outliers take high time complexities (e.g. quadratic or cubic…

Computational Geometry · Computer Science 2020-02-28 Hu Ding , Zixiu Wang

Clustering, or unsupervised classification, is a task often plagued by outliers. Yet there is a paucity of work on handling outliers in clustering. Outlier identification algorithms tend to fall into three broad categories: outlier…

Methodology · Statistics 2024-05-31 Katharine M. Clark , Paul D. McNicholas

Clustering analysis is one of the critical tasks in machine learning. Traditionally, clustering has been an independent task, separate from outlier detection. Due to the fact that the performance of clustering can be significantly eroded by…

Machine Learning · Computer Science 2022-08-12 Jiahao Deng , Eli T. Brown

The outlier detection problem in some cases is similar to the classification problem. For example, the main concern of clustering-based outlier detection algorithms is to find clusters and outliers, which are often regarded as noise that…

Machine Learning · Computer Science 2014-05-25 M. H. Marghny , Ahmed I. Taloba

The $k$-means is a popular clustering objective, although it is inherently non-robust and sensitive to outliers. Its popular seeding or initialization called $k$-means++ uses $D^{2}$ sampling and comes with a provable $O(\log k)$…

Machine Learning · Computer Science 2023-09-07 Amit Deshpande , Rameshwar Pratap

Consensus clustering aggregates partitions in order to find a better fit by reconciling clustering results from different sources/executions. In practice, there exist noise and outliers in clustering task, which, however, may significantly…

Machine Learning · Computer Science 2023-01-03 Deguang Kong , Miao Lu , Konstantin Shmakov , Jian Yang

Matrix-variate distributions are a recent addition to the model-based clustering field, thereby making it possible to analyze data in matrix form with complex structure such as images and time series. Due to its recent appearance, there is…

Machine Learning · Statistics 2025-08-04 Katharine M. Clark , Paul D. McNicholas
‹ Prev 1 2 3 10 Next ›