English
Related papers

Related papers: Feature Selection For High-Dimensional Clustering

200 papers

Mode clustering is a nonparametric method for clustering that defines clusters using the basins of attraction of a density estimator's modes. We provide several enhancements to mode clustering: (i) a soft variant of cluster assignment, (ii)…

Methodology · Statistics 2015-12-23 Yen-Chi Chen , Christopher R. Genovese , Larry Wasserman

High-dimensional clustering analysis is a challenging problem in statistics and machine learning, with broad applications such as the analysis of microarray data and RNA-seq data. In this paper, we propose a new clustering procedure called…

Methodology · Statistics 2022-10-31 Tianqi Liu , Yu Lu , Biqing Zhu , Hongyu Zhao

A novel nonparametric clustering algorithm is proposed using the interpoint distances between the members of the data to reveal the inherent clustering structure existing in the given set of data, where we apply the classical nonparametric…

Methodology · Statistics 2024-09-02 Soumita Modak

We propose a novel methodology for feature screening in clustering massive datasets, in which both the number of features and the number of observations can potentially be very large. Taking advantage of a fusion penalization based convex…

Methodology · Statistics 2017-10-05 Trambak Banerjee , Gourab Mukherjee , Peter Radchenko

We propose an algorithm for clustering high dimensional data. If $P$ features for $N$ objects are represented in an $N\times P$ matrix ${\bf X}$, where $N\ll P$, the method is based on exploiting the cluster-dependent structure of the…

Machine Learning · Statistics 2018-11-05 Shahina Rahman , Valen E. Johnson

Density mode clustering is a nonparametric clustering method. The clusters are the basins of attraction of the modes of a density estimator. We study the risk of mode-based clustering. We show that the clustering risk over the cluster cores…

Statistics Theory · Mathematics 2015-05-05 Martin Azizyan , Yen-Chi Chen , Aarti Singh , Larry Wasserman

This paper concerns the critical decision process of extracting or selecting the features before applying a clustering algorithm. It is not obvious to evaluate the importance of the features since the most popular methods to do it are…

Machine Learning · Computer Science 2021-11-23 Jean-Sebastien Dessureault , Daniel Massicotte

This paper deals with nonparametric estimation of conditional den-sities in mixture models in the case when additional covariates are available. The proposed approach consists of performing a prelim-inary clustering algorithm on the…

Statistics Theory · Mathematics 2015-02-09 Stéphane Auray , Nicolas Klutchnikoff , Laurent Rouvière

Clustering is an essential problem in machine learning and data mining. One vital factor that impacts clustering performance is how to learn or design the data representation (or features). Fortunately, recent advances in deep learning can…

Machine Learning · Computer Science 2015-01-14 Gang Chen

Feature selection methods are widely used to address the high computational overheads and curse of dimensionality in classifying high-dimensional data. Most conventional feature selection methods focus on handling homogeneous features,…

Machine Learning · Computer Science 2021-11-17 Xuyang Yan , Mrinmoy Sarkar , Biniam Gebru , Shabnam Nazmi , Abdollah Homaifar

Clustering methods are a valuable tool for the identification of patterns in high dimensional data with applications in many scientific problems. However, quantifying uncertainty in clustering is a challenging problem, particularly when…

Methodology · Statistics 2018-06-01 Marcio Valk , Gabriela Bettella Cybis

We develop new algorithmic methods with provable guarantees for feature selection in regard to categorical data clustering. While feature selection is one of the most common approaches to reduce dimensionality in practice, most of the known…

Data Structures and Algorithms · Computer Science 2021-08-20 Sayan Bandyapadhyay , Fedor V. Fomin , Petr A. Golovach , Kirill Simonov

High-dimensional feature selection is a central problem in a variety of application domains such as machine learning, image analysis, and genomics. In this paper, we propose graph-based tests as a useful basis for feature selection. We…

Methodology · Statistics 2024-08-13 Swarnadip Ghosh , Somabha Mukherjee , Divyansh Agarwal , Yichen He , Mingzhi Song , Xuejiao Pei

Quality assessments of models in unsupervised learning and clustering verification in particular have been a long-standing problem in the machine learning research. The lack of robust and universally applicable cluster validity scores often…

Machine Learning · Statistics 2018-03-30 Luzie Helfmann , Johannes von Lindheim , Mattes Mollenhauer , Ralf Banisch

A hierarchical scheme for clustering data is presented which applies to spaces with a high number of dimension ($N_{_{D}}>3$). The data set is first reduced to a smaller set of partitions (multi-dimensional bins). Multiple clustering…

Data Analysis, Statistics and Probability · Physics 2017-10-16 Kevin McIlhany , Stephen Wiggins

We study feature selection for $k$-means clustering. Although the literature contains many methods with good empirical performance, algorithms with provable theoretical behavior have only recently been developed. Unfortunately, these…

Machine Learning · Computer Science 2016-11-17 Christos Boutsidis , Malik Magdon-Ismail

It is important to identify the discriminative features for high dimensional clustering. However, due to the lack of cluster labels, the regularization methods developed for supervised feature selection can not be directly applied. To learn…

Machine Learning · Statistics 2025-07-16 Zhaoyu Xing , Yang Wan , Juan Wen , Wei Zhong

The nonparametric formulation of density-based clustering, known as modal clustering, draws a correspondence between groups and the attraction domains of the modes of the density function underlying the data. Its probabilistic foundation…

Methodology · Statistics 2020-10-27 Federico Ferraccioli , Giovanna Menardi

In this paper, we deal with the problem of curves clustering. We propose a nonparametric method which partitions the curves into clusters and discretizes the dimensions of the curve points into intervals. The cross-product of these…

Machine Learning · Statistics 2014-07-03 Marc Boullé , Romain Guigourès , Fabrice Rossi

Density Estimation is one of the central areas of statistics whose purpose is to estimate the probability density function underlying the observed data. It serves as a building block for many tasks in statistical inference, visualization,…

Machine Learning · Statistics 2019-04-02 Zhipeng Wang , David W. Scott
‹ Prev 1 2 3 10 Next ›