Related papers: Variable Selection for Clustering and Classificati…
Model-based clustering is a popular approach for clustering multivariate data which has seen applications in numerous fields. Nowadays, high-dimensional data are more and more common and the model-based clustering approach has adapted to…
The mixture models have become widely used in clustering, given its probabilistic framework in which its based, however, for modern databases that are characterized by their large size, these models behave disappointingly in setting out the…
The importance of variable selection for clustering has been recognized for some time, and mixture models are well-established as a statistical approach to clustering. Yet, the literature on variable selection in model-based clustering…
Variable selection in cluster analysis is important yet challenging. It can be achieved by regularization methods, which realize a trade-off between the clustering accuracy and the number of selected variables by using a lasso-type penalty.…
The interest in variable selection for clustering has increased recently due to the growing need in clustering high-dimensional data. Variable selection allows in particular to ease both the clustering and the interpretation of the results.…
The amount of information in the form of features and variables avail- able to machine learning algorithms is ever increasing. This can lead to classifiers that are prone to overfitting in high dimensions, high di- mensional models do not…
We propose a method for variable selection in discriminant analysis with mixed categorical and continuous variables. This method is based on a criterion that permits to reduce the variable selection problem to a problem of estimating…
Standard approaches to tackle high-dimensional supervised classification problem often include variable selection and dimension reduction procedures. The novel methodology proposed in this paper combines clustering of variables and feature…
In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which were not directly involved to cluster the data. An approach is proposed in the model-based clustering…
Clustering is a widely used technique in data mining applications for discovering patterns in underlying data. Most traditional clustering algorithms are limited to handling datasets that contain either numeric or categorical attributes.…
We present a new data analysis perspective to determine variable importance regardless of the underlying learning task. Traditionally, variable selection is considered an important step in supervised learning for both classification and…
Relevant methods of variable selection have been proposed in model-based clustering and classification. These methods are making use of backward or forward procedures to define the roles of the variables. Unfortunately, these stepwise…
Data analysis plays an indispensable role for value creation in industry. Cluster analysis in this context is able to explore given datasets with little or no prior knowledge and to identify unknown patterns. As (big) data complexity…
High-dimensional clustering analysis is a challenging problem in statistics and machine learning, with broad applications such as the analysis of microarray data and RNA-seq data. In this paper, we propose a new clustering procedure called…
Intelligent test requires efficient and effective analysis of high-dimensional data in a large scale. Traditionally, the analysis is often conducted by human experts, but it is not scalable in the era of big data. To tackle this challenge,…
Clustering algorithms are one of the main analytical methods to detect patterns in unlabeled data. Existing clustering methods typically treat samples in a dataset as points in a metric space and compute distances to group together similar…
When fitting statistical models, some predictors are often found to be correlated with each other, and functioning together. Many group variable selection methods are developed to select the groups of predictors that are closely related to…
A hierarchical scheme for clustering data is presented which applies to spaces with a high number of dimension ($N_{_{D}}>3$). The data set is first reduced to a smaller set of partitions (multi-dimensional bins). Multiple clustering…
We propose two approaches for selecting variables in latent class analysis (i.e.,mixture model assuming within component independence), which is the common model-based clustering method for mixed data. The first approach consists in…
One basic requirement of many studies is the necessity of classifying data. Clustering is a proposed method for summarizing networks. Clustering methods can be divided into two categories named model-based approaches and algorithmic…