English
Related papers

Related papers: A Statistical Perspective on Algorithmic Leveragin…

200 papers

Randomized algorithms for very large matrix problems have received a great deal of attention in recent years. Much of this work was motivated by problems in large-scale data analysis, and this work was performed by individuals from many…

Data Structures and Algorithms · Computer Science 2011-11-16 Michael W. Mahoney

In statistics and machine learning, logistic regression is a widely-used supervised learning technique primarily employed for binary classification tasks. When the number of observations greatly exceeds the number of predictor variables, we…

Machine Learning · Statistics 2024-04-02 Agniva Chowdhury , Pradeep Ramuhalli

Leverage score sampling provides an appealing way to perform approximate computations for large matrices. Indeed, it allows to derive faithful approximations with a complexity adapted to the problem at hand. Yet, performing leverage scores…

Machine Learning · Statistics 2019-01-25 Alessandro Rudi , Daniele Calandriello , Luigi Carratino , Lorenzo Rosasco

The demand of computational resources for the modeling process increases as the scale of the datasets does, since traditional approaches for regression involve inverting huge data matrices. The main problem relies on the large data size,…

Methodology · Statistics 2023-07-06 Vasilis Chasiotis , Dimitris Karlis

Random sampling has become a critical tool in solving massive matrix problems. For linear regression, a small, manageable set of data rows can be randomly selected to approximate a tall, skinny data matrix, improving processing time…

Data Structures and Algorithms · Computer Science 2014-08-22 Michael B. Cohen , Yin Tat Lee , Cameron Musco , Christopher Musco , Richard Peng , Aaron Sidford

We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank. Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized…

Data Structures and Algorithms · Computer Science 2022-03-08 Aleksandros Sobczyk , Efstratios Gallopoulos

A significant hurdle for analyzing large sample data is the lack of effective statistical computing and inference methods. An emerging powerful approach for analyzing large sample data is subsampling, by which one takes a random subsample…

Methodology · Statistics 2015-11-24 Rong Zhu , Ping Ma , Michael W. Mahoney , Bin Yu

Recent work in theoretical computer science and scientific computing has focused on nearly-linear-time algorithms for solving systems of linear equations. While introducing several novel theoretical perspectives, this work has yet to lead…

Numerical Analysis · Computer Science 2010-05-19 Petros Drineas , Michael W. Mahoney

Random features provide a practical framework for large-scale kernel approximation and supervised learning. It has been shown that data-dependent sampling of random features using leverage scores can significantly reduce the number of…

Machine Learning · Computer Science 2019-03-21 Shahin Shahrampour , Soheil Kolouri

Suppose an $n \times d$ design matrix in a linear regression problem is given, but the response for each point is hidden unless explicitly requested. The goal is to sample only a small number $k \ll n$ of the responses, and then produce a…

Machine Learning · Computer Science 2018-09-06 Michał Dereziński , Manfred K. Warmuth , Daniel Hsu

Suppose a matrix $A \in \mathbb{R}^{m \times n}$ of rank $r$ with singular value decomposition $A = U_{A}\Sigma_{A} V_{A}^{T}$, where $U_{A} \in \mathbb{R}^{m \times r}$, $V_{A} \in \mathbb{R}^{n \times r}$ are orthonormal and $\Sigma_{A}…

Numerical Analysis · Mathematics 2022-04-05 Qian Zuo , Hua Xiang

For massive data, the family of subsampling algorithms is popular to downsize the data volume and reduce computational burden. Existing studies focus on approximating the ordinary least squares estimate in linear regression, where…

Computation · Statistics 2019-06-27 HaiYing Wang , Rong Zhu , Ping Ma

Randomized algorithms, such as randomized sketching or stochastic optimization, are a promising approach to ease the computational burden in analyzing large datasets. However, randomized algorithms also produce non-deterministic outputs,…

Methodology · Statistics 2025-05-13 Zhixiang Zhang , Sokbae Lee , Edgar Dobriban

We consider statistical and algorithmic aspects of solving large-scale least-squares (LS) problems using randomized sketching algorithms. Prior results show that, from an \emph{algorithmic perspective}, when using sketching matrices…

Machine Learning · Statistics 2015-05-26 Garvesh Raskutti , Michael Mahoney

While leverage score sampling provides powerful tools for approximating solutions to large least squares problems, the cost of computing exact scores and sampling often prohibits practical application. This paper addresses this challenge by…

Numerical Analysis · Mathematics 2025-04-29 Osman Asif Malik , Yiming Xu , Nuojin Cheng , Stephen Becker , Alireza Doostan , Akil Narayan

The statistical leverage scores of a matrix $A$ are the squared row-norms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities are of interest in recently-popular…

Data Structures and Algorithms · Computer Science 2012-12-06 Petros Drineas , Malik Magdon-Ismail , Michael W. Mahoney , David P. Woodruff

We apply methods from randomized numerical linear algebra (RandNLA) to develop improved algorithms for the analysis of large-scale time series data. We first develop a new fast algorithm to estimate the leverage scores of an autoregressive…

Methodology · Statistics 2021-11-02 Ali Eshragh , Fred Roosta , Asef Nazari , Michael W. Mahoney

One approach to improving the running time of kernel-based machine learning methods is to build a small sketch of the input and use it in lieu of the full kernel matrix in the machine learning task of interest. Here, we describe a version…

Machine Learning · Statistics 2015-11-10 Ahmed El Alaoui , Michael W. Mahoney

We introduce a statistical physics inspired supervised machine learning algorithm for classification and regression problems. The method is based on the invariances or stability of predicted results when known data is represented as…

Machine Learning · Statistics 2018-11-19 Patrick Chao , Tahereh Mazaheri , Bo Sun , Nicholas B. Weingartner , Zohar Nussinov

Subsampling algorithms for various parametric regression models with massive data have been extensively investigated in recent years. However, all existing studies on subsampling heavily rely on clean massive data. In practical…

Statistics Theory · Mathematics 2025-06-11 Jiangshan Ju , Mingqiu Wang , Shengli Zhao
‹ Prev 1 2 3 10 Next ›