English
Related papers

Related papers: Clustering and variable selection for categorical …

200 papers

In this study, we consider unsupervised clustering of categorical vectors that can be of different size using mixture. We use likelihood maximization to estimate the parameters of the underlying mixture model and a penalization technique to…

Statistics Theory · Mathematics 2017-09-08 Esther Derman , Erwan Le Pennec

Variable selection in cluster analysis is important yet challenging. It can be achieved by regularization methods, which realize a trade-off between the clustering accuracy and the number of selected variables by using a lasso-type penalty.…

Methodology · Statistics 2016-12-23 Marbac Matthieu , Sedki Mohammed

We consider a problem of clustering a sequence of multinomial observations by way of a model selection criterion. We propose a form of a penalty term for the model selection procedure. Our approach subsumes both the conventional AIC and BIC…

Machine Learning · Statistics 2015-08-17 Nam H. Lee , Runze Tang , Carey E. Priebe , Michael Rosen

As data sets continue to grow in size and complexity, effective and efficient techniques are needed to target important features in the variable space. Many of the variable selection techniques that are commonly used alongside clustering…

Computation · Statistics 2013-03-22 Jeffrey L. Andrews , Paul D. McNicholas

Clustering analysis is one of the most widely used statistical tools in many emerging areas such as microarray data analysis. For microarray and other high-dimensional data, the presence of many noise variables may mask underlying…

Machine Learning · Statistics 2008-03-26 Benhuai Xie , Wei Pan , Xiaotong Shen

In this paper we consider high-dimensional multiclass classification by sparse multinomial logistic regression. We propose first a feature selection procedure based on penalized maximum likelihood with a complexity penalty on the model size…

Statistics Theory · Mathematics 2020-11-20 Felix Abramovich , Vadim Grinshtein , Tomer Levy

Model-based clustering integrated with variable selection is a powerful tool for uncovering latent structures within complex data. However, its effectiveness is often hindered by challenges such as identifying relevant variables that define…

In this article, we propose a penalized clustering method for large scale data with multiple covariates through a functional data approach. In the proposed method, responses and covariates are linked together through nonparametric…

Methodology · Statistics 2008-01-17 Ping Ma , Wenxuan Zhong

We propose two approaches for selecting variables in latent class analysis (i.e.,mixture model assuming within component independence), which is the common model-based clustering method for mixed data. The first approach consists in…

Computation · Statistics 2017-03-08 Matthieu Marbac , Mohammed Sedki

Model selection, via penalized likelihood type criteria, is a standard task in many statistical inference and machine learning problems. Progress has led to deriving criteria with asymptotic consistency results and an increasing emphasis on…

Statistics Theory · Mathematics 2022-05-13 TrungTin Nguyen , Faicel Chamroukhi , Hien Duy Nguyen , Florence Forbes

We consider a finite mixture of Gaussian regression model for high- dimensional data, where the number of covariates may be much larger than the sample size. We propose to estimate the unknown conditional mixture density by a maximum…

Statistics Theory · Mathematics 2014-09-05 Emilie Devijver

Penalized regression methods, such as lasso and elastic net, are used in many biomedical applications when simultaneous regression coefficient estimation and variable selection is desired. However, missing data complicates the…

Variable selection is fundamental to high-dimensional statistical modeling. Many variable selection techniques may be implemented by maximum penalized likelihood using various penalty functions. Optimizing the penalized likelihood function…

Statistics Theory · Mathematics 2007-06-13 David R. Hunter , Runze Li

We consider high-dimensional binary classification by sparse logistic regression. We propose a model/feature selection procedure based on penalized maximum likelihood with a complexity penalty on the model size and derive the non-asymptotic…

Statistics Theory · Mathematics 2018-11-20 Felix Abramovich , Vadim Grinshtein

During the last decades, many methods for the analysis of functional data including classification methods have been developed. Nonetheless, there are issues that have not been adressed satisfactorily by currently available methods, as, for…

Methodology · Statistics 2017-02-08 Karen Fuchs , Wolfgang Pößnecker , Gerhard Tutz

This paper deals with variable selection in the regression and binary classification frameworks. It proposes an automatic and exhaustive procedure which relies on the use of the CART algorithm and on model selection via penalization. This…

Statistics Theory · Mathematics 2011-01-05 Marie Sauvé , Christine Tuleau-Malot

Measurement error data or errors-in-variable data have been collected in many studies. Natural criterion functions are often unavailable for general functional measurement error models due to the lack of information on the distribution of…

Statistics Theory · Mathematics 2010-02-24 Yanyuan Ma , Runze Li

In the causal adjustment setting, variable selection techniques based on either the outcome or treatment allocation model can result in the omission of confounders or the inclusion of spurious variables in the propensity score. We propose a…

Statistics Theory · Mathematics 2014-06-06 Ashkan Ertefaie , Masoud Asgharian , David A. Stephens

This article considers a linear model in a high dimensional data scenario. We propose a process which uses multiple loss functions both to select relevant predictors and to estimate parameters, and study its asymptotic properties. Variable…

Methodology · Statistics 2020-07-01 Guorong Dai , Ursula U. Müller

We consider model selection in generalized linear models (GLM) for high-dimensional data and propose a wide class of model selection criteria based on penalized maximum likelihood with a complexity penalty on the model size. We derive a…

Statistics Theory · Mathematics 2016-03-31 Felix Abramovich , Vadim Grinshtein
‹ Prev 1 2 3 10 Next ›