Related papers: Multi-Model Subset Selection
Sparse methods are the standard approach to obtain interpretable models with high prediction accuracy. Alternatively, algorithmic ensemble methods can achieve higher prediction accuracy at the cost of loss of interpretability. However, the…
Subset selection in multiple linear regression aims to choose a subset of candidate explanatory variables that tradeoff fitting error (explanatory power) and model complexity (number of variables selected). We build mathematical programming…
We introduce a very general method for sparse and large-scale variable selection. The large-scale regression settings is such that both the number of parameters and the number of samples are extremely large. The proposed method is based on…
The motivation of this work is to improve the performance of standard stacking approaches or ensembles, which are composed of simple, heterogeneous base models, through the integration of the generation and selection stages for regression…
Sparse modelling or model selection with categorical data is challenging even for a moderate number of variables, because one parameter is roughly needed to encode one category or level. The Group Lasso is a well known efficient algorithm…
We propose dimension reduction methods for sparse, high-dimensional multivariate response regression models. Both the number of responses and that of the predictors may exceed the sample size. Sometimes viewed as complementary, predictor…
In this paper we consider high-dimensional multiclass classification by sparse multinomial logistic regression. We propose first a feature selection procedure based on penalized maximum likelihood with a complexity penalty on the model size…
Sparse modeling is a powerful framework for data analysis and processing. Traditionally, encoding in this framework is done by solving an l_1-regularized linear regression problem, usually called Lasso. In this work we first combine the…
Hybrid ensemble, an essential branch of ensembles, has flourished in the regression field, with studies confirming diversity's importance. However, previous ensembles consider diversity in the sub-model training stage, with limited…
An important problem in the analysis of high-dimensional omics data is to identify subsets of molecular variables that are associated with a phenotype of interest. This requires addressing the challenges of high dimensionality, strong…
Both feature selection and hyperparameter tuning are key tasks in machine learning. Hyperparameter tuning is often useful to increase model performance, while feature selection is undertaken to attain sparse models. Sparsity may yield…
Sparse linear regression -- finding an unknown vector from linear measurements -- is now known to be possible with fewer samples than variables, via methods like the LASSO. We consider the multiple sparse linear regression problem, where…
In this paper we discuss the variable selection method from \ell0-norm constrained regression, which is equivalent to the problem of finding the best subset of a fixed size. Our study focuses on two aspects, consistency and computation. We…
Standard approaches for variable selection in linear models are not tailored to deal properly with high-dimensional and incomplete data. Currently, methods dedicated to high-dimensional data handle missing values by ad-hoc strategies, like…
Mixture of Experts (MoE) are successful models for modeling heterogeneous data in many statistical learning problems including regression, clustering and classification. Generally fitted by maximum likelihood estimation via the well-known…
A significant hurdle for analyzing large sample data is the lack of effective statistical computing and inference methods. An emerging powerful approach for analyzing large sample data is subsampling, by which one takes a random subsample…
We consider high-dimensional binary classification by sparse logistic regression. We propose a model/feature selection procedure based on penalized maximum likelihood with a complexity penalty on the model size and derive the non-asymptotic…
The popular Lasso approach for sparse estimation can be derived via marginalization of a joint density associated with a particular stochastic model. A different marginalization of the same probabilistic model leads to a different…
Tree-based ensembles such as the Random Forest are modern classics among statistical learning methods. In particular, they are used for predicting univariate responses. In case of multiple outputs the question arises whether we separately…
Machine learning techniques always aim to reduce the generalized prediction error. In order to reduce it, ensemble methods present a good approach combining several models that results in a greater forecasting capacity. The Random Machines…