Related papers: On stepwise regression
Given data y(n) and p(n)covariates x(n) one problem in linear regression is to decide which if any of the covariates to include. There are many articles on this problem but all are based on a stochastic model for the data. This paper gives…
We propose a new sparse regression method called the component lasso, based on a simple idea. The method uses the connected-components structure of the sample covariance matrix to split the problem into smaller ones. It then solves the…
Recent work has focused on the problem of conducting linear regression when the number of covariates is very large, potentially greater than the sample size. To facilitate this, one useful tool is to assume that the model can be well…
We propose a computationally intensive method, the random lasso method, for variable selection in linear models. The method consists of two major steps. In step 1, the lasso method is applied to many bootstrap samples, each using a set of…
Given data $y$ and $k$ covariates $x$ the problem is to decide which covariates to include when approximating $y$ by a linear function of the covariates. The decision is based on replacing subsets of the covariates by i.i.d. normal random…
Scaled sparse linear regression jointly estimates the regression coefficients and noise level in a linear model. It chooses an equilibrium with a sparse regression method by iteratively estimating the noise level via the mean residual…
Given data $\mathbf{y}$ and $k$ covariates $\mathbf{x}_j$ one problem in linear regression is to decide which if any of the covariates to include when regressing the dependent variable $\mathbf{y}$ on the covariates $\mathbf{x}_j$. In this…
We consider the estimation of regression models on strata defined using a categorical covariate, in order to identify interactions between this categorical covariate and the other predictors. A basic approach requires the choice of a…
The lasso is a popular tool for sparse linear regression, especially for problems in which the number of variables p exceeds the number of observations n. But when p>n, the lasso criterion is not strictly convex, and hence it may not have a…
It is well-known that the statistical performance of Lasso can suffer significantly when the covariates of interest have strong correlations. In particular, the prediction error of Lasso becomes much worse than computationally inefficient…
We consider regression problems where the number of predictors greatly exceeds the number of observations. We propose a method for variable selection that first estimates the regression function, yielding a "pre-conditioned" response…
The Lasso regression is a popular regularization method for feature selection in statistics. Prior to computing the Lasso estimator in both linear and generalized linear models, it is common to conduct a preliminary rescaling of the feature…
Regression with the lasso penalty is a popular tool for performing dimension reduction when the number of covariates is large. In many applications of the lasso, like in genomics, covariates are subject to measurement error. We study the…
Sparse linear regression is a vast field and there are many different algorithms available to build models. Two new papers published in Statistical Science study the comparative performance of several sparse regression methodologies,…
The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a…
In exciting new work, Bertsimas et al. (2016) showed that the classical best subset selection problem in regression modeling can be formulated as a mixed integer optimization (MIO) problem. Using recent advances in MIO algorithms, they…
Lasso and other regularization procedures are attractive methods for variable selection, subject to a proper choice of shrinkage parameter. Given a set of potential subsets produced by a regularization algorithm, a consistent model…
High-dimensional prediction typically comprises two steps: variable selection and subsequent least-squares refitting on the selected variables. However, the standard variable selection procedures, such as the lasso, hinge on tuning…
Sparse linear regression is a central problem in high-dimensional statistics. We study the correlated random design setting, where the covariates are drawn from a multivariate Gaussian $N(0,\Sigma)$, and we seek an estimator with small…
It is argued that all model based approaches to the selection of covariates in linear regression have failed. This applies to frequentist approaches based on P-values and to Bayesian approaches although for different reasons. In the first…