Related papers: Clustering and variable selection for categorical …

Clustering and Model Selection via Penalized Likelihood for Different-sized Categorical Data Vectors

In this study, we consider unsupervised clustering of categorical vectors that can be of different size using mixture. We use likelihood maximization to estimate the parameters of the underlying mixture model and a penalization technique to…

Statistics Theory · Mathematics 2017-09-08 Esther Derman , Erwan Le Pennec

Variable selection for model-based clustering using the integrated complete-data likelihood

Variable selection in cluster analysis is important yet challenging. It can be achieved by regularization methods, which realize a trade-off between the clustering accuracy and the number of selected variables by using a lasso-type penalty.…

Methodology · Statistics 2016-12-23 Marbac Matthieu , Sedki Mohammed

A model selection approach for clustering a multinomial sequence with non-negative factorization

We consider a problem of clustering a sequence of multinomial observations by way of a model selection criterion. We propose a form of a penalty term for the model selection procedure. Our approach subsumes both the conventional AIC and BIC…

Machine Learning · Statistics 2015-08-17 Nam H. Lee , Runze Tang , Carey E. Priebe , Michael Rosen

Variable Selection for Clustering and Classification

As data sets continue to grow in size and complexity, effective and efficient techniques are needed to target important features in the variable space. Many of the variable selection techniques that are commonly used alongside clustering…

Computation · Statistics 2013-03-22 Jeffrey L. Andrews , Paul D. McNicholas

Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables

Clustering analysis is one of the most widely used statistical tools in many emerging areas such as microarray data analysis. For microarray and other high-dimensional data, the presence of many noise variables may mask underlying…

Machine Learning · Statistics 2008-03-26 Benhuai Xie , Wei Pan , Xiaotong Shen

Multiclass classification by sparse multinomial logistic regression

In this paper we consider high-dimensional multiclass classification by sparse multinomial logistic regression. We propose first a feature selection procedure based on penalized maximum likelihood with a complexity penalty on the model size…

Statistics Theory · Mathematics 2020-11-20 Felix Abramovich , Vadim Grinshtein , Tomer Levy

A Unified Framework for Variable Selection in Model-Based Clustering with Missing Not at Random

Model-based clustering integrated with variable selection is a powerful tool for uncovering latent structures within complex data. However, its effectiveness is often hindered by challenges such as identifying relevant variables that define…

Methodology · Statistics 2025-11-05 Binh H. Ho , Long Nguyen Chi , TrungTin Nguyen , Binh T. Nguyen , Van Ha Hoang , Christopher Drovandi

Penalized Clustering of Large Scale Functional Data with Multiple Covariates

In this article, we propose a penalized clustering method for large scale data with multiple covariates through a functional data approach. In the proposed method, responses and covariates are linked together through nonparametric…

Methodology · Statistics 2008-01-17 Ping Ma , Wenxuan Zhong

Variable selection for mixed data clustering: a model-based approach

We propose two approaches for selecting variables in latent class analysis (i.e.,mixture model assuming within component independence), which is the common model-based clustering method for mixed data. The first approach consists in…

Computation · Statistics 2017-03-08 Matthieu Marbac , Mohammed Sedki

Non-asymptotic model selection in block-diagonal mixture of polynomial experts models

Model selection, via penalized likelihood type criteria, is a standard task in many statistical inference and machine learning problems. Progress has led to deriving criteria with asymptotic consistency results and an increasing emphasis on…

Statistics Theory · Mathematics 2022-05-13 TrungTin Nguyen , Faicel Chamroukhi , Hien Duy Nguyen , Florence Forbes

Finite mixture regression: A sparse variable selection by model selection for clustering

We consider a finite mixture of Gaussian regression model for high- dimensional data, where the number of covariates may be much larger than the sample size. We propose to estimate the unknown conditional mixture density by a maximum…

Statistics Theory · Mathematics 2014-09-05 Emilie Devijver

Variable selection with multiply-imputed datasets: choosing between stacked and grouped methods

Penalized regression methods, such as lasso and elastic net, are used in many biomedical applications when simultaneous regression coefficient estimation and variable selection is desired. However, missing data complicates the…

Methodology · Statistics 2020-03-18 Jiacong Du , Jonathan Boss , Peisong Han , Lauren J Beesley , Stephen A Goutman , Stuart Batterman , Eva L Feldman , Bhramar Mukherjee

Variable selection using MM algorithms

Variable selection is fundamental to high-dimensional statistical modeling. Many variable selection techniques may be implemented by maximum penalized likelihood using various penalty functions. Optimizing the penalized likelihood function…

Statistics Theory · Mathematics 2007-06-13 David R. Hunter , Runze Li

High-dimensional classification by sparse logistic regression

We consider high-dimensional binary classification by sparse logistic regression. We propose a model/feature selection procedure based on penalized maximum likelihood with a complexity penalty on the model size and derive the non-asymptotic…

Statistics Theory · Mathematics 2018-11-20 Felix Abramovich , Vadim Grinshtein

Classification of Functional Data with k-Nearest-Neighbor Ensembles by Fitting Constrained Multinomial Logit Models

During the last decades, many methods for the analysis of functional data including classification methods have been developed. Nonetheless, there are issues that have not been adressed satisfactorily by currently available methods, as, for…

Methodology · Statistics 2017-02-08 Karen Fuchs , Wolfgang Pößnecker , Gerhard Tutz

Variable selection through CART

This paper deals with variable selection in the regression and binary classification frameworks. It proposes an automatic and exhaustive procedure which relies on the use of the CART algorithm and on model selection via penalization. This…

Statistics Theory · Mathematics 2011-01-05 Marie Sauvé , Christine Tuleau-Malot

Variable selection in measurement error models

Measurement error data or errors-in-variable data have been collected in many studies. Natural criterion functions are often unavailable for general functional measurement error models due to the lack of information on the distribution of…

Statistics Theory · Mathematics 2010-02-24 Yanyuan Ma , Runze Li

Variable Selection in Causal Inference Using Penalization

In the causal adjustment setting, variable selection techniques based on either the outcome or treatment allocation model can result in the omission of confounders or the inclusion of spurious variables in the propensity score. We propose a…

Statistics Theory · Mathematics 2014-06-06 Ashkan Ertefaie , Masoud Asgharian , David A. Stephens

Penalized regression with multiple loss functions and selection by vote

This article considers a linear model in a high dimensional data scenario. We propose a process which uses multiple loss functions both to select relevant predictors and to estimate parameters, and study its asymptotic properties. Variable…

Methodology · Statistics 2020-07-01 Guorong Dai , Ursula U. Müller

Model selection and minimax estimation in generalized linear models

We consider model selection in generalized linear models (GLM) for high-dimensional data and propose a wide class of model selection criteria based on penalized maximum likelihood with a complexity penalty on the model size. We derive a…

Statistics Theory · Mathematics 2016-03-31 Felix Abramovich , Vadim Grinshtein