Related papers: Efficient subsampling for high-dimensional data

"Pre-conditioning" for feature selection and regression in high-dimensional problems

We consider regression problems where the number of predictors greatly exceeds the number of observations. We propose a method for variable selection that first estimates the regression function, yielding a "pre-conditioned" response…

Statistics Theory · Mathematics 2013-04-16 Debashis Paul , Eric Bair , Trevor Hastie , Robert Tibshirani

Efficient Data Reduction Strategies for Big Data and High-Dimensional LASSO Regressions

The IBOSS approach proposed by Wang et al. (2019) selects the most informative subset of n points. It assumes that the ordinary least squares method is used and requires that the number of variables, p, is not large. However, in many…

Methodology · Statistics 2024-01-23 Xin Wang , Min Yang , William Li

LASSO-Driven Inference in Time and Space

We consider the estimation and inference in a system of high-dimensional regression equations allowing for temporal and cross-sectional dependency in covariates and error processes, covering rather general forms of weak temporal dependence.…

Econometrics · Economics 2020-05-18 Victor Chernozhukov , Wolfgang K. Härdle , Chen Huang , Weining Wang

Post-Lasso Inference for High-Dimensional Regression

Among the most popular variable selection procedures in high-dimensional regression, Lasso provides a solution path to rank the variables and determines a cut-off position on the path to select variables and estimate coefficients. In this…

Methodology · Statistics 2018-06-19 X. Jessie Jeng , Huimin Peng , Wenbin Lu

On the selection of optimal subdata for big data regression based on leverage scores

The demand of computational resources for the modeling process increases as the scale of the datasets does, since traditional approaches for regression involve inverting huge data matrices. The main problem relies on the large data size,…

Methodology · Statistics 2023-07-06 Vasilis Chasiotis , Dimitris Karlis

Optimal Subsampling Approaches for Large Sample Linear Regression

A significant hurdle for analyzing large sample data is the lack of effective statistical computing and inference methods. An emerging powerful approach for analyzing large sample data is subsampling, by which one takes a random subsample…

Methodology · Statistics 2015-11-24 Rong Zhu , Ping Ma , Michael W. Mahoney , Bin Yu

Prediction Weighted Maximum Frequency Selection

Shrinkage estimators that possess the ability to produce sparse solutions have become increasingly important to the analysis of today's complex datasets. Examples include the LASSO, the Elastic-Net and their adaptive counterparts.…

Methodology · Statistics 2017-02-09 Hongmei Liu , J. Sunil Rao

Big Data Analysis Using Shrinkage Strategies

In this paper, we apply shrinkage strategies to estimate regression coefficients efficiently for the high-dimensional multiple regression model, where the number of samples is smaller than the number of predictors. We assume in the sparse…

Methodology · Statistics 2017-04-19 B. Yuzbasi , M. Arashi , S. E. Ahmed

Variable Selection Incorporating Prior Constraint Information into Lasso

We propose the variable selection procedure incorporating prior constraint information into lasso. The proposed procedure combines the sample and prior information, and selects significant variables for responses in a narrower region where…

Methodology · Statistics 2011-02-19 Shurong Zheng , Guodong Song , Ning-Zhong Shi

A comparison of strategies for selecting auxiliary variables for multiple imputation

Multiple imputation (MI) is a popular method for handling missing data. Auxiliary variables can be added to the imputation model(s) to improve MI estimates. However, the choice of which auxiliary variables to include in the imputation model…

Methodology · Statistics 2022-04-01 Rheanna M. Mainzer , Cattram D. Nguyen , John B. Carlin , Margarita Moreno-Betancur , Ian R. White , Katherine J. Lee

Subbagging Variable Selection for Big Data

This article introduces a subbagging (subsample aggregating) approach for variable selection in regression within the context of big data. The proposed subbagging approach not only ensures that variable selection is scalable given the…

Methodology · Statistics 2025-03-10 Xian Li , Xuan Liang , Tao Zou

On an improvement of LASSO by scaling

A sparse modeling is a major topic in machine learning and statistics. LASSO (Least Absolute Shrinkage and Selection Operator) is a popular sparse modeling method while it has been known to yield unexpected large bias especially at a sparse…

Machine Learning · Computer Science 2018-08-23 Katsuyuki Hagiwara

Ultrahigh Dimensional Variable Selection for Mapping Soil Carbon

Modern soil mapping is characterised by the need to interpolate samples of geostatistical response observations and the availability of relatively large numbers of environmental characteristics for consideration as covariates to aid this…

Applications · Statistics 2016-09-09 Benjamin R. Fitzpatrick , David W. Lamb , Kerrie Mengersen

Efficient Test-based Variable Selection for High-dimensional Linear Models

Variable selection plays a fundamental role in high-dimensional data analysis. Various methods have been developed for variable selection in recent years. Well-known examples are forward stepwise regression (FSR) and least angle regression…

Methodology · Statistics 2018-02-01 Siliang Gong , Kai Zhang , Yufeng Liu

AcSel: selecting variables with accuracy in correlated datasets

With the emergence of high-throughput technologies, it is possible to measure large amounts of data relatively at low cost. Such situations arise in many fields from sciences to humanities, and variable selection may be of great help to…

Computation · Statistics 2021-08-17 Jung Nicolas , Frédéric Bertrand , Myriam Maumy-Bertrand

Functional L-Optimality Subsampling for Massive Data

Massive data bring the big challenges of memory and computation for analysis. These challenges can be tackled by taking subsamples from the full data as a surrogate. For functional data, it is common to collect multiple measurements over…

Methodology · Statistics 2021-07-07 Hua Liu , Jinhong You , Jiguo Cao

Active sampling: A machine-learning-assisted framework for finite population inference with optimal subsamples

Data subsampling has become widely recognized as a tool to overcome computational and economic bottlenecks in analyzing massive datasets. We contribute to the development of adaptive design for estimation of finite population…

Methodology · Statistics 2024-07-08 Henrik Imberg , Xiaomi Yang , Carol Flannagan , Jonas Bärgman

Diversity Subsampling: Custom Subsamples from Large Data Sets

Subsampling from a large data set is useful in many supervised learning contexts to provide a global view of the data based on only a fraction of the observations. Diverse (or space-filling) subsampling is an appealing subsampling approach…

Methodology · Statistics 2023-11-27 Boyang Shang , Daniel W. Apley , Sanjay Mehrotra

Sparse covariance thresholding for high-dimensional variable selection

In high-dimensions, many variable selection methods, such as the lasso, are often limited by excessive variability and rank deficiency of the sample covariance matrix. Covariance sparsity is a natural phenomenon in high-dimensional…

Methodology · Statistics 2010-06-08 X. Jessie Jeng And Z. John Daye

Inference for feature selection using the Lasso with high-dimensional data

Penalized regression models such as the Lasso have proved useful for variable selection in many fields - especially for situations with high-dimensional data where the numbers of predictors far exceeds the number of observations. These…

Methodology · Statistics 2014-03-19 Kasper Brink-Jensen , Claus Thorn Ekstrøm