Related papers: A Practical Algorithm for Distributed Clustering a…
Clustering is a widely used technique with a long and rich history in a variety of areas. However, most existing algorithms do not scale well to large datasets, or are missing theoretical guarantees of convergence. This paper introduces a…
Being robust to the presence of outliers is crucial for applying clustering algorithms in practice. In the $\textit{robust $k$-Means}$ problem (i.e., $k$-Means with outliers), the goal is to remove $z$ outliers and minimize the $k$-Means…
Clustering has many important applications in computer science, but real-world datasets often contain outliers. Moreover, the presence of outliers can make the clustering problems to be much more challenging. To reduce the complexities,…
Outliers are the points which are different from or inconsistent with the rest of the data. They can be novel, new, abnormal, unusual or noisy information. Outliers are sometimes more interesting than the majority of the data. The main…
The outlier detection problem in some cases is similar to the classification problem. For example, the main concern of clustering-based outlier detection algorithms is to find clusters and outliers, which are often regarded as noise that…
Clustering problems are well-studied in a variety of fields such as data science, operations research, and computer science. Such problems include variants of centre location problems, $k$-median, and $k$-means to name a few. In some cases,…
Metric $k$-center clustering is a fundamental unsupervised learning primitive. Although widely used, this primitive is heavily affected by noise in the data, so that a more sensible variant seeks for the best solution that disregards a…
Real-world datasets often contain outliers, and the presence of outliers can make the clustering problems to be much more challenging. In this paper, we propose a simple uniform sampling framework for solving three representative…
Outliers are ubiquitous in modern data sets. Distance-based techniques are a popular non-parametric approach to outlier detection as they require no prior assumptions on the data generating distribution and are simple to implement. Scaling…
Outlier detection in data streams has gained wide importance presently due to the increasing cases of fraud in various applications of data streams. The techniques for outlier detection have been divided into either statistics based,…
Notwithstanding the popularity of conventional clustering algorithms such as K-means and probabilistic clustering, their clustering results are sensitive to the presence of outliers in the data. Even a few outliers can compromise the…
In this paper, we consider the $k$-center/median/means clustering with outliers problems (or the $(k, z)$-center/median/means problems) in the distributed setting. Most previous distributed algorithms have their communication costs linearly…
Clustering, or unsupervised classification, is a task often plagued by outliers. Yet there is a paucity of work on handling outliers in clustering. Outlier identification algorithms tend to fall into three broad categories: outlier…
Clustering and outlier detection are two important tasks in data mining. Outliers frequently interfere with clustering algorithms to determine the similarity between objects, resulting in unreliable clustering results. Currently, only a few…
Often the challenge associated with tasks like fraud and spam detection[1] is the lack of all likely patterns needed to train suitable supervised learning models. In order to overcome this limitation, such tasks are attempted as outlier or…
In this paper, we consider two types of robust models of the $k$-median/$k$-means problems: the outlier-version ($k$-MedO/$k$-MeaO) and the penalty-version ($k$-MedP/$k$-MeaP), in which we can mark some points as outliers and discard them.…
We propose a new assumption in outlier detection: Normal data instances are commonly located in the area that there is hardly any fluctuation on data density, while outliers are often appeared in the area that there is violent fluctuation…
This paper considers $k$-means clustering in the presence of noise. It is known that $k$-means clustering is highly sensitive to noise, and thus noise should be removed to obtain a quality solution. A popular formulation of this problem is…
We introduce and study the $k$-center clustering problem with set outliers, a natural and practical generalization of the classical $k$-center clustering with outliers. Instead of removing individual data points, our model allows discarding…
Outlier detection is an important problem occurring in a wide range of areas. Outliers are the outcome of fraudulent behaviour, mechanical faults, human error, or simply natural deviations. Many data mining applications perform outlier…