Related papers: Distributed variable screening for generalized lin…

Linear screening for high-dimensional computer experiments

In this paper we propose a linear variable screening method for computer experiments when the number of input variables is larger than the number of runs. This method uses a linear model to model the nonlinear data, and screens the…

Methodology · Statistics 2020-06-16 Chunya Li , Daijun Chen , Shifeng Xiong

Penalized linear regression with high-dimensional pairwise screening

In variable selection, most existing screening methods focus on marginal effects and ignore dependence between covariates. To improve the performance of selection, we incorporate pairwise effects in covariates for screening and…

Methodology · Statistics 2019-02-12 Siliang Gong , Kai Zhang , Yufeng Liu

Estimation and model selection in generalized additive partial linear models for correlated data with diverging number of covariates

We propose generalized additive partial linear models for complex data which allow one to capture nonlinear patterns of some covariates, in the presence of linear components. The proposed method improves estimation efficiency and increases…

Statistics Theory · Mathematics 2014-05-26 Li Wang , Lan Xue , Annie Qu , Hua Liang

Conditional nonparametric variable screening by neural factor regression

High-dimensional covariates often admit linear factor structure. To effectively screen correlated covariates in high-dimension, we propose a conditional variable screening test based on non-parametric regression using neural networks due to…

Econometrics · Economics 2024-08-21 Jianqing Fan , Weining Wang , Yue Zhao

Sparse Data-Driven Random Projection in Regression for High-Dimensional Data

We examine the linear regression problem in a challenging high-dimensional setting with correlated predictors where the vector of coefficients can vary from sparse to dense. In this setting, we propose a combination of probabilistic…

Methodology · Statistics 2025-05-13 Roman Parzer , Peter Filzmoser , Laura Vana-Gür

Distributed Feature Screening via Componentwise Debiasing

Feature screening is a powerful tool in the analysis of high dimensional data. When the sample size $N$ and the number of features $p$ are both large, the implementation of classic screening methods can be numerically challenging. In this…

Methodology · Statistics 2019-03-12 Xingxiang Li , Runze Li , Zhiming Xia , Chen Xu

Distributed Simultaneous Inference in Generalized Linear Models via Confidence Distribution

We propose a distributed method for simultaneous inference for datasets with sample size much larger than the number of covariates, i.e., N >> p, in the generalized linear models framework. When such datasets are too big to be analyzed…

Methodology · Statistics 2020-07-23 Lu Tang , Ling Zhou , Peter X. -K. Song

Selective Inference with Distributed Data

As datasets grow larger, they are often distributed across multiple machines that compute in parallel and communicate with a central machine through short messages. In this paper, we focus on sparse regression and propose a new procedure…

Methodology · Statistics 2023-03-14 Sifan Liu , Snigdha Panigrahi

Distributed Dynamic Safe Screening Algorithms for Sparse Regularization

Distributed optimization has been widely used as one of the most efficient approaches for model training with massive samples. However, large-scale learning problems with both massive samples and high-dimensional features widely exist in…

Machine Learning · Computer Science 2022-04-26 Runxue Bao , Xidong Wu , Wenhan Xian , Heng Huang

Screening methods for linear errors-in-variables models in high dimensions

Microarray studies, in order to identify genes associated with an outcome of interest, usually produce noisy measurements for a large number of gene expression features from a small number of subjects. One common approach to analyzing such…

Methodology · Statistics 2021-04-21 Linh Nghiem , Francis K. C. Hui , Samuel Mueller , A. H. Welsh

A method for variable selection in a multivariate functional linear regression model

We propose a new variable selection procedure for a functional linear model with multiple scalar responses and multiple functional predictors. This method is based on basis expansions of the involved functional predictors and coefficients…

Statistics Theory · Mathematics 2023-11-03 Alban Mina Mbina , Guy Martial Nkiet

Estimation and Inference for High Dimensional Generalized Linear Models: A Splitting and Smoothing Approach

The focus of modern biomedical studies has gradually shifted to explanation and estimation of joint effects of high dimensional predictors on disease risks. Quantifying uncertainty in these estimates may provide valuable insight into…

Methodology · Statistics 2021-03-09 Zhe Fei , Yi Li

High-dimensional variable selection via tilting

The paper considers variable selection in linear regression models where the number of covariates is possibly much larger than the number of observations. High dimensionality of the data brings in many complications, such as (possibly…

Methodology · Statistics 2016-11-29 Haeran Cho , Piotr Fryzlewicz

On Variable Screening in Multiple Nonparametric Regression Model

In this article, we study the problem of variable screening in multiple nonparametric regression model. The proposed methodology is based on the fact that the partial derivative of the regression function with respect to the irrelevant…

Methodology · Statistics 2021-01-19 Subhra Sankar Dhar , Prashant Jha , Aranyak Acharyya

Sparse Linear Mixed Model Selection via Streamlined Variational Bayes

Linear mixed models are a versatile statistical tool to study data by accounting for fixed effects and random effects from multiple sources of variability. In many situations, a large number of candidate fixed effects is available and it is…

Methodology · Statistics 2022-09-09 Emanuele Degani , Luca Maestrini , Dorota Toczydłowska , Matt P. Wand

Variable selection for general index models via sliced inverse regression

Variable selection, also known as feature selection in machine learning, plays an important role in modeling high dimensional data and is key to data-driven scientific discoveries. We consider here the problem of detecting influential…

Methodology · Statistics 2014-09-24 Bo Jiang , Jun S. Liu

Non-penalized variable selection in high-dimensional linear model settings via generalized fiducial inference

Standard penalized methods of variable selection and parameter estimation rely on the magnitude of coefficient estimates to decide which variables to include in the final model. However, coefficient estimates are unreliable when the design…

Methodology · Statistics 2018-02-13 Jonathan P Williams , Jan Hannig

Random Partitioning and Distribution-based Thresholding for Iterative Variable Screening in High Dimensions

In big data analysis, a simple task such as linear regression can become very challenging as the variable dimension $p$ grows. As a result, variable screening is inevitable in many scientific studies. In recent years, randomized algorithms…

Methodology · Statistics 2019-02-13 Yu-Hsiang Cheng , Tzee-Ming Huang , Su-Yun Huang

Screening Methods for Classification Based on Non-parametric Bayesian Tests

Feature or variable selection is a problem inherent to large data sets. While many methods have been proposed to deal with this problem, some can scale poorly with the number of predictors in a data set. Screening methods scale linearly…

Methodology · Statistics 2023-01-09 Naveed Merchant , Jeffrey D. Hart

Distributed estimation of principal support vector machines for sufficient dimension reduction

The principal support vector machines method (Li et al., 2011) is a powerful tool for sufficient dimension reduction that replaces original predictors with their low-dimensional linear combinations without loss of information. However, the…

Machine Learning · Statistics 2019-12-02 Jun Jin , Chao Ying , Zhou Yu