Related papers: Distributed variable screening for generalized lin…
In this paper we propose a linear variable screening method for computer experiments when the number of input variables is larger than the number of runs. This method uses a linear model to model the nonlinear data, and screens the…
In variable selection, most existing screening methods focus on marginal effects and ignore dependence between covariates. To improve the performance of selection, we incorporate pairwise effects in covariates for screening and…
We propose generalized additive partial linear models for complex data which allow one to capture nonlinear patterns of some covariates, in the presence of linear components. The proposed method improves estimation efficiency and increases…
High-dimensional covariates often admit linear factor structure. To effectively screen correlated covariates in high-dimension, we propose a conditional variable screening test based on non-parametric regression using neural networks due to…
We examine the linear regression problem in a challenging high-dimensional setting with correlated predictors where the vector of coefficients can vary from sparse to dense. In this setting, we propose a combination of probabilistic…
Feature screening is a powerful tool in the analysis of high dimensional data. When the sample size $N$ and the number of features $p$ are both large, the implementation of classic screening methods can be numerically challenging. In this…
We propose a distributed method for simultaneous inference for datasets with sample size much larger than the number of covariates, i.e., N >> p, in the generalized linear models framework. When such datasets are too big to be analyzed…
As datasets grow larger, they are often distributed across multiple machines that compute in parallel and communicate with a central machine through short messages. In this paper, we focus on sparse regression and propose a new procedure…
Distributed optimization has been widely used as one of the most efficient approaches for model training with massive samples. However, large-scale learning problems with both massive samples and high-dimensional features widely exist in…
Microarray studies, in order to identify genes associated with an outcome of interest, usually produce noisy measurements for a large number of gene expression features from a small number of subjects. One common approach to analyzing such…
We propose a new variable selection procedure for a functional linear model with multiple scalar responses and multiple functional predictors. This method is based on basis expansions of the involved functional predictors and coefficients…
The focus of modern biomedical studies has gradually shifted to explanation and estimation of joint effects of high dimensional predictors on disease risks. Quantifying uncertainty in these estimates may provide valuable insight into…
The paper considers variable selection in linear regression models where the number of covariates is possibly much larger than the number of observations. High dimensionality of the data brings in many complications, such as (possibly…
In this article, we study the problem of variable screening in multiple nonparametric regression model. The proposed methodology is based on the fact that the partial derivative of the regression function with respect to the irrelevant…
Linear mixed models are a versatile statistical tool to study data by accounting for fixed effects and random effects from multiple sources of variability. In many situations, a large number of candidate fixed effects is available and it is…
Variable selection, also known as feature selection in machine learning, plays an important role in modeling high dimensional data and is key to data-driven scientific discoveries. We consider here the problem of detecting influential…
Standard penalized methods of variable selection and parameter estimation rely on the magnitude of coefficient estimates to decide which variables to include in the final model. However, coefficient estimates are unreliable when the design…
In big data analysis, a simple task such as linear regression can become very challenging as the variable dimension $p$ grows. As a result, variable screening is inevitable in many scientific studies. In recent years, randomized algorithms…
Feature or variable selection is a problem inherent to large data sets. While many methods have been proposed to deal with this problem, some can scale poorly with the number of predictors in a data set. Screening methods scale linearly…
The principal support vector machines method (Li et al., 2011) is a powerful tool for sufficient dimension reduction that replaces original predictors with their low-dimensional linear combinations without loss of information. However, the…