Related papers: Feature Selection For High-Dimensional Clustering

A Comprehensive Approach to Mode Clustering

Mode clustering is a nonparametric method for clustering that defines clusters using the basins of attraction of a density estimator's modes. We provide several enhancements to mode clustering: (i) a soft variant of cluster assignment, (ii)…

Methodology · Statistics 2015-12-23 Yen-Chi Chen , Christopher R. Genovese , Larry Wasserman

Clustering High-dimensional Data via Feature Selection

High-dimensional clustering analysis is a challenging problem in statistics and machine learning, with broad applications such as the analysis of microarray data and RNA-seq data. In this paper, we propose a new clustering procedure called…

Methodology · Statistics 2022-10-31 Tianqi Liu , Yu Lu , Biqing Zhu , Hongyu Zhao

A new interpoint distance-based clustering algorithm using kernel density estimation

A novel nonparametric clustering algorithm is proposed using the interpoint distances between the members of the data to reveal the inherent clustering structure existing in the given set of data, where we apply the classical nonparametric…

Methodology · Statistics 2024-09-02 Soumita Modak

Feature Screening in Large Scale Cluster Analysis

We propose a novel methodology for feature screening in clustering massive datasets, in which both the number of features and the number of observations can potentially be very large. Taking advantage of a fusion penalization based convex…

Methodology · Statistics 2017-10-05 Trambak Banerjee , Gourab Mukherjee , Peter Radchenko

A Fast Algorithm for Clustering High Dimensional Feature Vectors

We propose an algorithm for clustering high dimensional data. If $P$ features for $N$ objects are represented in an $N\times P$ matrix ${\bf X}$, where $N\ll P$, the method is based on exploiting the cluster-dependent structure of the…

Machine Learning · Statistics 2018-11-05 Shahina Rahman , Valen E. Johnson

Risk Bounds For Mode Clustering

Density mode clustering is a nonparametric clustering method. The clusters are the basins of attraction of the modes of a density estimator. We study the risk of mode-based clustering. We show that the clustering risk over the cluster cores…

Statistics Theory · Mathematics 2015-05-05 Martin Azizyan , Yen-Chi Chen , Aarti Singh , Larry Wasserman

Feature selection or extraction decision process for clustering using PCA and FRSD

This paper concerns the critical decision process of extracting or selecting the features before applying a clustering algorithm. It is not obvious to evaluate the importance of the features since the most popular methods to do it are…

Machine Learning · Computer Science 2021-11-23 Jean-Sebastien Dessureault , Daniel Massicotte

On clustering procedures and nonparametric mixture estimation

This paper deals with nonparametric estimation of conditional den-sities in mixture models in the case when additional covariates are available. The proposed approach consists of performing a prelim-inary clustering algorithm on the…

Statistics Theory · Mathematics 2015-02-09 Stéphane Auray , Nicolas Klutchnikoff , Laurent Rouvière

Deep Learning with Nonparametric Clustering

Clustering is an essential problem in machine learning and data mining. One vital factor that impacts clustering performance is how to learn or design the data representation (or features). Fortunately, recent advances in deep learning can…

Machine Learning · Computer Science 2015-01-14 Gang Chen

A Supervised Feature Selection Method For Mixed-Type Data using Density-based Feature Clustering

Feature selection methods are widely used to address the high computational overheads and curse of dimensionality in classifying high-dimensional data. Most conventional feature selection methods focus on handling homogeneous features,…

Machine Learning · Computer Science 2021-11-17 Xuyang Yan , Mrinmoy Sarkar , Biniam Gebru , Shabnam Nazmi , Abdollah Homaifar

U-statistical inference for hierarchical clustering

Clustering methods are a valuable tool for the identification of patterns in high dimensional data with applications in many scientific problems. However, quantifying uncertainty in clustering is a challenging problem, particularly when…

Methodology · Statistics 2018-06-01 Marcio Valk , Gabriela Bettella Cybis

Parameterized Complexity of Feature Selection for Categorical Data Clustering

We develop new algorithmic methods with provable guarantees for feature selection in regard to categorical data clustering. While feature selection is one of the most common approaches to reduce dimensionality in practice, most of the known…

Data Structures and Algorithms · Computer Science 2021-08-20 Sayan Bandyapadhyay , Fedor V. Fomin , Petr A. Golovach , Kirill Simonov

Feature Selection in High-dimensional Spaces Using Graph-Based Methods

High-dimensional feature selection is a central problem in a variety of application domains such as machine learning, image analysis, and genomics. In this paper, we propose graph-based tests as a useful basis for feature selection. We…

Methodology · Statistics 2024-08-13 Swarnadip Ghosh , Somabha Mukherjee , Divyansh Agarwal , Yichen He , Mingzhi Song , Xuejiao Pei

On Hyperparameter Search in Cluster Ensembles

Quality assessments of models in unsupervised learning and clustering verification in particular have been a long-standing problem in the machine learning research. The lack of robust and universally applicable cluster validity scores often…

Machine Learning · Statistics 2018-03-30 Luzie Helfmann , Johannes von Lindheim , Mattes Mollenhauer , Ralf Banisch

High Dimensional Cluster Analysis Using Path Lengths

A hierarchical scheme for clustering data is presented which applies to spaces with a high number of dimension ($N_{_{D}}>3$). The data set is first reduced to a smaller set of partitions (multi-dimensional bins). Multiple clustering…

Data Analysis, Statistics and Probability · Physics 2017-10-16 Kevin McIlhany , Stephen Wiggins

Deterministic Feature Selection for $k$-means Clustering

We study feature selection for $k$-means clustering. Although the literature contains many methods with good empirical performance, algorithms with provable theoretical behavior have only recently been developed. Unfortunately, these…

Machine Learning · Computer Science 2016-11-17 Christos Boutsidis , Malik Magdon-Ismail

GOLFS: Feature Selection via Combining Both Global and Local Information for High Dimensional Clustering

It is important to identify the discriminative features for high dimensional clustering. However, due to the lack of cluster labels, the regularization methods developed for supervised feature selection can not be directly applied. To learn…

Machine Learning · Statistics 2025-07-16 Zhaoyu Xing , Yang Wan , Juan Wen , Wei Zhong

Modal clustering of matrix-variate data

The nonparametric formulation of density-based clustering, known as modal clustering, draws a correspondence between groups and the attraction domains of the modes of the density function underlying the data. Its probabilistic foundation…

Methodology · Statistics 2020-10-27 Federico Ferraccioli , Giovanna Menardi

Nonparametric Hierarchical Clustering of Functional Data

In this paper, we deal with the problem of curves clustering. We propose a nonparametric method which partitions the curves into clusters and discretizes the dimensions of the curve points into intervals. The cross-product of these…

Machine Learning · Statistics 2014-07-03 Marc Boullé , Romain Guigourès , Fabrice Rossi

Nonparametric Density Estimation for High-Dimensional Data - Algorithms and Applications

Density Estimation is one of the central areas of statistics whose purpose is to estimate the probability density function underlying the observed data. It serves as a building block for many tasks in statistical inference, visualization,…

Machine Learning · Statistics 2019-04-02 Zhipeng Wang , David W. Scott