English
Related papers

Related papers: A Model-free Variable Screening Method Based on Le…

200 papers

The demand of computational resources for the modeling process increases as the scale of the datasets does, since traditional approaches for regression involve inverting huge data matrices. The main problem relies on the large data size,…

Methodology · Statistics 2023-07-06 Vasilis Chasiotis , Dimitris Karlis

Leverage score sampling provides an appealing way to perform approximate computations for large matrices. Indeed, it allows to derive faithful approximations with a complexity adapted to the problem at hand. Yet, performing leverage scores…

Machine Learning · Statistics 2019-01-25 Alessandro Rudi , Daniele Calandriello , Luigi Carratino , Lorenzo Rosasco

Active learning aims to obtain a classifier of high accuracy by using fewer label requests in comparison to passive learning by selecting effective queries. Many active learning methods have been developed in the past two decades, which…

Machine Learning · Computer Science 2016-08-08 Cem Orhan , Öznur Taştan

We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank. Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized…

Data Structures and Algorithms · Computer Science 2022-03-08 Aleksandros Sobczyk , Efstratios Gallopoulos

Feature or variable selection is a problem inherent to large data sets. While many methods have been proposed to deal with this problem, some can scale poorly with the number of predictors in a data set. Screening methods scale linearly…

Methodology · Statistics 2023-01-09 Naveed Merchant , Jeffrey D. Hart

In high dimensional analysis, effects of explanatory variables on responses sometimes rely on certain exposure variables, such as time or environmental factors. In this paper, to characterize the importance of each predictor, we utilize its…

Methodology · Statistics 2018-04-11 Yeqing Zhou , Jingyuan Liu , Zhihui Hao , Liping Zhu

In this article, we develop a distributed variable screening method for generalized linear models. This method is designed to handle situations where both the sample size and the number of covariates are large. Specifically, the proposed…

Methodology · Statistics 2024-05-09 Tianbo Diao , Lianqiang Qu , Bo Li , Liuquan Sun

Variable selection plays a fundamental role in high-dimensional data analysis. Various methods have been developed for variable selection in recent years. Well-known examples are forward stepwise regression (FSR) and least angle regression…

Methodology · Statistics 2018-02-01 Siliang Gong , Kai Zhang , Yufeng Liu

A major hurdle in machine learning is scalability to massive datasets. Approaches to overcome this hurdle include compression of the data matrix and distributing the computations. \textit{Leverage score sampling} provides a compressed…

Information Theory · Computer Science 2020-09-16 Neophytos Charalambides , Mert Pilanci , Alfred O. Hero

Suppose an $n \times d$ design matrix in a linear regression problem is given, but the response for each point is hidden unless explicitly requested. The goal is to sample only a small number $k \ll n$ of the responses, and then produce a…

Machine Learning · Computer Science 2018-09-06 Michał Dereziński , Manfred K. Warmuth , Daniel Hsu

Modern bio-technologies have produced a vast amount of high-throughput data with the number of predictors far greater than the sample size. In order to identify more novel biomarkers and understand biological mechanisms, it is vital to…

Machine Learning · Statistics 2018-05-18 Kevin He , Jian Kang , Hyokyoung Grace Hong , Ji Zhu , Yanming Li , Huazhen Lin , Han Xu , Yi Li

Random features provide a practical framework for large-scale kernel approximation and supervised learning. It has been shown that data-dependent sampling of random features using leverage scores can significantly reduce the number of…

Machine Learning · Computer Science 2019-03-21 Shahin Shahrampour , Soheil Kolouri

One popular method for dealing with large-scale data sets is sampling. For example, by using the empirical statistical leverage scores as an importance sampling distribution, the method of algorithmic leveraging samples and rescales…

Methodology · Statistics 2013-06-25 Ping Ma , Michael W. Mahoney , Bin Yu

We explain theoretically a curious empirical phenomenon: "Approximating a matrix by deterministically selecting a subset of its columns with the corresponding largest leverage scores results in a good low-rank matrix surrogate". To obtain…

Data Structures and Algorithms · Computer Science 2014-06-04 Dimitris Papailiopoulos , Anastasios Kyrillidis , Christos Boutsidis

We propose a new method for input variable selection in nonlinear regression. The method is embedded into a kernel regression machine that can model general nonlinear functions, not being a priori limited to additive models. This is the…

Machine Learning · Computer Science 2018-09-05 Magda Gregorová , Jason Ramapuram , Alexandros Kalousis , Stéphane Marchand-Maillet

Variable selection, also known as feature selection in machine learning, plays an important role in modeling high dimensional data and is key to data-driven scientific discoveries. We consider here the problem of detecting influential…

Methodology · Statistics 2014-09-24 Bo Jiang , Jun S. Liu

In a standard regression problem, we have a set of explanatory variables whose effect on some response vector is modeled. For wide binary data, such as genetic marker data, we often have two limitations. First, we have more parameters than…

Methodology · Statistics 2021-09-20 Katharina Parry , Leo N. Geppert , Alexander Munteanu , Katja Ickstadt

In this paper we propose a linear variable screening method for computer experiments when the number of input variables is larger than the number of runs. This method uses a linear model to model the nonlinear data, and screens the…

Methodology · Statistics 2020-06-16 Chunya Li , Daijun Chen , Shifeng Xiong

We propose a new model-free feature screening method based on energy distances for ultrahigh-dimensional binary classification problems. With a high probability, the proposed method retains only relevant features after discarding all the…

Methodology · Statistics 2023-05-19 Sarbojit Roy , Soham Sarkar , Subhajit Dutta , Anil K. Ghosh

We examine the linear regression problem in a challenging high-dimensional setting with correlated predictors where the vector of coefficients can vary from sparse to dense. In this setting, we propose a combination of probabilistic…

Methodology · Statistics 2025-05-13 Roman Parzer , Peter Filzmoser , Laura Vana-Gür
‹ Prev 1 2 3 10 Next ›