Related papers: A Model-free Variable Screening Method Based on Le…

On the selection of optimal subdata for big data regression based on leverage scores

The demand of computational resources for the modeling process increases as the scale of the datasets does, since traditional approaches for regression involve inverting huge data matrices. The main problem relies on the large data size,…

Methodology · Statistics 2023-07-06 Vasilis Chasiotis , Dimitris Karlis

On Fast Leverage Score Sampling and Optimal Learning

Leverage score sampling provides an appealing way to perform approximate computations for large matrices. Indeed, it allows to derive faithful approximations with a complexity adapted to the problem at hand. Yet, performing leverage scores…

Machine Learning · Statistics 2019-01-25 Alessandro Rudi , Daniele Calandriello , Luigi Carratino , Lorenzo Rosasco

ALEVS: Active Learning by Statistical Leverage Sampling

Active learning aims to obtain a classifier of high accuracy by using fewer label requests in comparison to passive learning by selecting effective queries. Many active learning methods have been developed in the past two decades, which…

Machine Learning · Computer Science 2016-08-08 Cem Orhan , Öznur Taştan

Estimating leverage scores via rank revealing methods and randomization

We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank. Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized…

Data Structures and Algorithms · Computer Science 2022-03-08 Aleksandros Sobczyk , Efstratios Gallopoulos

Screening Methods for Classification Based on Non-parametric Bayesian Tests

Feature or variable selection is a problem inherent to large data sets. While many methods have been proposed to deal with this problem, some can scale poorly with the number of predictors in a data set. Screening methods scale linearly…

Methodology · Statistics 2023-01-09 Naveed Merchant , Jeffrey D. Hart

Model-Free Conditional Feature Screening with Exposure Variables

In high dimensional analysis, effects of explanatory variables on responses sometimes rely on certain exposure variables, such as time or environmental factors. In this paper, to characterize the importance of each predictor, we utilize its…

Methodology · Statistics 2018-04-11 Yeqing Zhou , Jingyuan Liu , Zhihui Hao , Liping Zhu

Distributed variable screening for generalized linear models

In this article, we develop a distributed variable screening method for generalized linear models. This method is designed to handle situations where both the sample size and the number of covariates are large. Specifically, the proposed…

Methodology · Statistics 2024-05-09 Tianbo Diao , Lianqiang Qu , Bo Li , Liuquan Sun

Efficient Test-based Variable Selection for High-dimensional Linear Models

Variable selection plays a fundamental role in high-dimensional data analysis. Various methods have been developed for variable selection in recent years. Well-known examples are forward stepwise regression (FSR) and least angle regression…

Methodology · Statistics 2018-02-01 Siliang Gong , Kai Zhang , Yufeng Liu

Weighted Gradient Coding with Leverage Score Sampling

A major hurdle in machine learning is scalability to massive datasets. Approaches to overcome this hurdle include compression of the data matrix and distributing the computations. \textit{Leverage score sampling} provides a compressed…

Information Theory · Computer Science 2020-09-16 Neophytos Charalambides , Mert Pilanci , Alfred O. Hero

Leveraged volume sampling for linear regression

Suppose an $n \times d$ design matrix in a linear regression problem is given, but the response for each point is hidden unless explicitly requested. The goal is to sample only a small number $k \ll n$ of the responses, and then produce a…

Machine Learning · Computer Science 2018-09-06 Michał Dereziński , Manfred K. Warmuth , Daniel Hsu

Covariance-Insured Screening

Modern bio-technologies have produced a vast amount of high-throughput data with the number of predictors far greater than the sample size. In order to identify more novel biomarkers and understand biological mechanisms, it is vital to…

Machine Learning · Statistics 2018-05-18 Kevin He , Jian Kang , Hyokyoung Grace Hong , Ji Zhu , Yanming Li , Huazhen Lin , Han Xu , Yi Li

On Sampling Random Features From Empirical Leverage Scores: Implementation and Theoretical Guarantees

Random features provide a practical framework for large-scale kernel approximation and supervised learning. It has been shown that data-dependent sampling of random features using leverage scores can significantly reduce the number of…

Machine Learning · Computer Science 2019-03-21 Shahin Shahrampour , Soheil Kolouri

A Statistical Perspective on Algorithmic Leveraging

One popular method for dealing with large-scale data sets is sampling. For example, by using the empirical statistical leverage scores as an importance sampling distribution, the method of algorithmic leveraging samples and rescales…

Methodology · Statistics 2013-06-25 Ping Ma , Michael W. Mahoney , Bin Yu

Provable Deterministic Leverage Score Sampling

We explain theoretically a curious empirical phenomenon: "Approximating a matrix by deterministically selecting a subset of its columns with the corresponding largest leverage scores results in a good low-rank matrix surrogate". To obtain…

Data Structures and Algorithms · Computer Science 2014-06-04 Dimitris Papailiopoulos , Anastasios Kyrillidis , Christos Boutsidis

Large-scale Nonlinear Variable Selection via Kernel Random Features

We propose a new method for input variable selection in nonlinear regression. The method is embedded into a kernel regression machine that can model general nonlinear functions, not being a priori limited to additive models. This is the…

Machine Learning · Computer Science 2018-09-05 Magda Gregorová , Jason Ramapuram , Alexandros Kalousis , Stéphane Marchand-Maillet

Variable selection for general index models via sliced inverse regression

Variable selection, also known as feature selection in machine learning, plays an important role in modeling high dimensional data and is key to data-driven scientific discoveries. We consider here the problem of detecting influential…

Methodology · Statistics 2014-09-24 Bo Jiang , Jun S. Liu

Cross-Leverage Scores for Selecting Subsets of Explanatory Variables

In a standard regression problem, we have a set of explanatory variables whose effect on some response vector is modeled. For wide binary data, such as genetic marker data, we often have two limitations. First, we have more parameters than…

Methodology · Statistics 2021-09-20 Katharina Parry , Leo N. Geppert , Alexander Munteanu , Katja Ickstadt

Linear screening for high-dimensional computer experiments

In this paper we propose a linear variable screening method for computer experiments when the number of input variables is larger than the number of runs. This method uses a linear model to model the nonlinear data, and screens the…

Methodology · Statistics 2020-06-16 Chunya Li , Daijun Chen , Shifeng Xiong

On Exact Feature Screening in Ultrahigh-dimensional Binary Classification

We propose a new model-free feature screening method based on energy distances for ultrahigh-dimensional binary classification problems. With a high probability, the proposed method retains only relevant features after discarding all the…

Methodology · Statistics 2023-05-19 Sarbojit Roy , Soham Sarkar , Subhajit Dutta , Anil K. Ghosh

Sparse Data-Driven Random Projection in Regression for High-Dimensional Data

We examine the linear regression problem in a challenging high-dimensional setting with correlated predictors where the vector of coefficients can vary from sparse to dense. In this setting, we propose a combination of probabilistic…

Methodology · Statistics 2025-05-13 Roman Parzer , Peter Filzmoser , Laura Vana-Gür