English
Related papers

Related papers: Efficient subsampling for high-dimensional data

200 papers

We consider regression problems where the number of predictors greatly exceeds the number of observations. We propose a method for variable selection that first estimates the regression function, yielding a "pre-conditioned" response…

Statistics Theory · Mathematics 2013-04-16 Debashis Paul , Eric Bair , Trevor Hastie , Robert Tibshirani

The IBOSS approach proposed by Wang et al. (2019) selects the most informative subset of n points. It assumes that the ordinary least squares method is used and requires that the number of variables, p, is not large. However, in many…

Methodology · Statistics 2024-01-23 Xin Wang , Min Yang , William Li

We consider the estimation and inference in a system of high-dimensional regression equations allowing for temporal and cross-sectional dependency in covariates and error processes, covering rather general forms of weak temporal dependence.…

Econometrics · Economics 2020-05-18 Victor Chernozhukov , Wolfgang K. Härdle , Chen Huang , Weining Wang

Among the most popular variable selection procedures in high-dimensional regression, Lasso provides a solution path to rank the variables and determines a cut-off position on the path to select variables and estimate coefficients. In this…

Methodology · Statistics 2018-06-19 X. Jessie Jeng , Huimin Peng , Wenbin Lu

The demand of computational resources for the modeling process increases as the scale of the datasets does, since traditional approaches for regression involve inverting huge data matrices. The main problem relies on the large data size,…

Methodology · Statistics 2023-07-06 Vasilis Chasiotis , Dimitris Karlis

A significant hurdle for analyzing large sample data is the lack of effective statistical computing and inference methods. An emerging powerful approach for analyzing large sample data is subsampling, by which one takes a random subsample…

Methodology · Statistics 2015-11-24 Rong Zhu , Ping Ma , Michael W. Mahoney , Bin Yu

Shrinkage estimators that possess the ability to produce sparse solutions have become increasingly important to the analysis of today's complex datasets. Examples include the LASSO, the Elastic-Net and their adaptive counterparts.…

Methodology · Statistics 2017-02-09 Hongmei Liu , J. Sunil Rao

In this paper, we apply shrinkage strategies to estimate regression coefficients efficiently for the high-dimensional multiple regression model, where the number of samples is smaller than the number of predictors. We assume in the sparse…

Methodology · Statistics 2017-04-19 B. Yuzbasi , M. Arashi , S. E. Ahmed

We propose the variable selection procedure incorporating prior constraint information into lasso. The proposed procedure combines the sample and prior information, and selects significant variables for responses in a narrower region where…

Methodology · Statistics 2011-02-19 Shurong Zheng , Guodong Song , Ning-Zhong Shi

Multiple imputation (MI) is a popular method for handling missing data. Auxiliary variables can be added to the imputation model(s) to improve MI estimates. However, the choice of which auxiliary variables to include in the imputation model…

This article introduces a subbagging (subsample aggregating) approach for variable selection in regression within the context of big data. The proposed subbagging approach not only ensures that variable selection is scalable given the…

Methodology · Statistics 2025-03-10 Xian Li , Xuan Liang , Tao Zou

A sparse modeling is a major topic in machine learning and statistics. LASSO (Least Absolute Shrinkage and Selection Operator) is a popular sparse modeling method while it has been known to yield unexpected large bias especially at a sparse…

Machine Learning · Computer Science 2018-08-23 Katsuyuki Hagiwara

Modern soil mapping is characterised by the need to interpolate samples of geostatistical response observations and the availability of relatively large numbers of environmental characteristics for consideration as covariates to aid this…

Applications · Statistics 2016-09-09 Benjamin R. Fitzpatrick , David W. Lamb , Kerrie Mengersen

Variable selection plays a fundamental role in high-dimensional data analysis. Various methods have been developed for variable selection in recent years. Well-known examples are forward stepwise regression (FSR) and least angle regression…

Methodology · Statistics 2018-02-01 Siliang Gong , Kai Zhang , Yufeng Liu

With the emergence of high-throughput technologies, it is possible to measure large amounts of data relatively at low cost. Such situations arise in many fields from sciences to humanities, and variable selection may be of great help to…

Computation · Statistics 2021-08-17 Jung Nicolas , Frédéric Bertrand , Myriam Maumy-Bertrand

Massive data bring the big challenges of memory and computation for analysis. These challenges can be tackled by taking subsamples from the full data as a surrogate. For functional data, it is common to collect multiple measurements over…

Methodology · Statistics 2021-07-07 Hua Liu , Jinhong You , Jiguo Cao

Data subsampling has become widely recognized as a tool to overcome computational and economic bottlenecks in analyzing massive datasets. We contribute to the development of adaptive design for estimation of finite population…

Methodology · Statistics 2024-07-08 Henrik Imberg , Xiaomi Yang , Carol Flannagan , Jonas Bärgman

Subsampling from a large data set is useful in many supervised learning contexts to provide a global view of the data based on only a fraction of the observations. Diverse (or space-filling) subsampling is an appealing subsampling approach…

Methodology · Statistics 2023-11-27 Boyang Shang , Daniel W. Apley , Sanjay Mehrotra

In high-dimensions, many variable selection methods, such as the lasso, are often limited by excessive variability and rank deficiency of the sample covariance matrix. Covariance sparsity is a natural phenomenon in high-dimensional…

Methodology · Statistics 2010-06-08 X. Jessie Jeng And Z. John Daye

Penalized regression models such as the Lasso have proved useful for variable selection in many fields - especially for situations with high-dimensional data where the numbers of predictors far exceeds the number of observations. These…

Methodology · Statistics 2014-03-19 Kasper Brink-Jensen , Claus Thorn Ekstrøm
‹ Prev 1 2 3 10 Next ›