Related papers: Accuracy and Robustness of Clustering Algorithms f…

Robust Hierarchical Clustering

One of the most widely used techniques for data clustering is agglomerative clustering. Such algorithms have been long used across many different fields ranging from computational biology to social sciences to computer vision in part…

Machine Learning · Computer Science 2014-07-15 Maria-Florina Balcan , Yingyu Liang , Pramod Gupta

When is Clustering Perturbation Robust?

Clustering is a fundamental data mining tool that aims to divide data into groups of similar items. Generally, intuition about clustering reflects the ideal case -- exact data sets endowed with flawless dissimilarity between individual…

Machine Learning · Computer Science 2016-01-25 Margareta Ackerman , Jarrod Moore

A Short Survey on Data Clustering Algorithms

With rapidly increasing data, clustering algorithms are important tools for data analytics in modern research. They have been successfully applied to a wide range of domains; for instance, bioinformatics, speech recognition, and financial…

Data Structures and Algorithms · Computer Science 2015-12-01 Ka-Chun Wong

Clustering under Local Stability: Bridging the Gap between Worst-Case and Beyond Worst-Case Analysis

Recently, there has been substantial interest in clustering research that takes a beyond worst-case approach to the analysis of algorithms. The typical idea is to design a clustering algorithm that outputs a near-optimal solution, provided…

Data Structures and Algorithms · Computer Science 2018-12-31 Maria-Florina Balcan , Colin White

Clustering with Confidence: Finding Clusters with Statistical Guarantees

Clustering is a widely used unsupervised learning method for finding structure in the data. However, the resulting clusters are typically presented without any guarantees on their robustness; slightly changing the used data sample or…

Machine Learning · Statistics 2017-01-02 Andreas Henelius , Kai Puolamäki , Henrik Boström , Panagiotis Papapetrou

Robust Bayesian Cluster Enumeration Based on the $t$ Distribution

A major challenge in cluster analysis is that the number of data clusters is mostly unknown and it must be estimated prior to clustering the observed data. In real-world applications, the observed data is often subject to heavy tailed noise…

Machine Learning · Statistics 2020-05-06 Freweyni K. Teklehaymanot , Michael Muma , Abdelhak M. Zoubir

Robust Fair Clustering with Group Membership Uncertainty Sets

We study the canonical fair clustering problem where each cluster is constrained to have close to population-level representation of each group. Despite significant attention, the salient issue of having incomplete knowledge about the group…

Machine Learning · Computer Science 2024-11-21 Sharmila Duppala , Juan Luque , John P. Dickerson , Seyed A. Esmaeili

Clustering validity based on the most similarity

One basic requirement of many studies is the necessity of classifying data. Clustering is a proposed method for summarizing networks. Clustering methods can be divided into two categories named model-based approaches and algorithmic…

Machine Learning · Computer Science 2013-02-19 Raheleh Namayandeh , Farzad Didehvar , Zahra Shojaei

On the Interaction Effects Between Prediction and Clustering

Machine learning systems increasingly depend on pipelines of multiple algorithms to provide high quality and well structured predictions. This paper argues interaction effects between clustering and prediction (e.g. classification,…

Machine Learning · Statistics 2019-01-01 Matt Barnes , Artur Dubrawski

Selection of variables for cluster analysis and classification rules

In this paper we introduce two procedures for variable selection in cluster analysis and classification rules. One is mainly oriented to detect the noisy non-informative variables, while the other deals also with multicolinearity. A…

Statistics Theory · Mathematics 2023-12-29 Ricardo Fraiman , Ana Justel , Marcela Svarc

When Should You Adjust Standard Errors for Clustering?

In empirical work it is common to estimate parameters of models and report associated standard errors that account for "clustering" of units, where clusters are defined by factors such as geography. Clustering adjustments are typically…

Statistics Theory · Mathematics 2022-09-21 Alberto Abadie , Susan Athey , Guido Imbens , Jeffrey Wooldridge

Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables

Clustering analysis is one of the most widely used statistical tools in many emerging areas such as microarray data analysis. For microarray and other high-dimensional data, the presence of many noise variables may mask underlying…

Machine Learning · Statistics 2008-03-26 Benhuai Xie , Wei Pan , Xiaotong Shen

Research on Clustering Performance of Sparse Subspace Clustering

Recently, sparse subspace clustering has been a valid tool to deal with high-dimensional data. There are two essential steps in the framework of sparse subspace clustering. One is solving the coefficient matrix of data, and the other is…

Computer Vision and Pattern Recognition · Computer Science 2019-12-24 Wen-Jin Fu , Xiao-Jun Wu , He-Feng Yin , Wen-Bo Hu

Accuracy Evaluation of Overlapping and Multi-resolution Clustering Algorithms on Large Datasets

Performance of clustering algorithms is evaluated with the help of accuracy metrics. There is a great diversity of clustering algorithms, which are key components of many data analysis and exploration systems. However, there exist only few…

Data Structures and Algorithms · Computer Science 2019-02-18 Artem Lutov , Mourad Khayati , Philippe Cudré-Mauroux

Machine learning for discriminating quantum measurement trajectories and improving readout

High-fidelity measurements are important for the physical implementation of quantum information protocols. Current methods for classifying measurement trajectories in superconducting qubit systems produce fidelities that are systematically…

Quantum Physics · Physics 2015-05-27 Easwar Magesan , Jay M. Gambetta , A. D. Córcoles , Jerry M. Chow

Subspace Clustering with Missing and Corrupted Data

Given full or partial information about a collection of points that lie close to a union of several subspaces, subspace clustering refers to the process of clustering the points according to their subspace and identifying the subspaces. One…

Machine Learning · Statistics 2018-01-16 Zachary Charles , Amin Jalali , Rebecca Willett

How to Design Robust Algorithms using Noisy Comparison Oracle

Metric based comparison operations such as finding maximum, nearest and farthest neighbor are fundamental to studying various clustering techniques such as $k$-center clustering and agglomerative hierarchical clustering. These techniques…

Data Structures and Algorithms · Computer Science 2021-05-13 Raghavendra Addanki , Sainyam Galhotra , Barna Saha

On the Robustness of Decision Tree Learning under Label Noise

In most practical problems of classifier learning, the training data suffers from the label noise. Hence, it is important to understand how robust is a learning algorithm to such label noise. This paper presents some theoretical analysis to…

Machine Learning · Computer Science 2016-08-29 Aritra Ghosh , Naresh Manwani , P. S. Sastry

Robust Clustering for Time Series Using Spectral Densities and Functional Data Analysis

In this work a robust clustering algorithm for stationary time series is proposed. The algorithm is based on the use of estimated spectral densities, which are considered as functional data, as the basic characteristic of stationary time…

Machine Learning · Statistics 2017-02-09 Diego Rivera-García , Luis Angel García-Escudero , Agustín Mayo-Iscar , Joaquín Ortega

A Rapid Review of Clustering Algorithms

Clustering algorithms aim to organize data into groups or clusters based on the inherent patterns and similarities within the data. They play an important role in today's life, such as in marketing and e-commerce, healthcare, data…

Machine Learning · Computer Science 2024-01-17 Hui Yin , Amir Aryani , Stephen Petrie , Aishwarya Nambissan , Aland Astudillo , Shengyuan Cao