Related papers: An Iterative Scheme for Leverage-based Approximate…

A Statistical Perspective on Algorithmic Leveraging

One popular method for dealing with large-scale data sets is sampling. For example, by using the empirical statistical leverage scores as an importance sampling distribution, the method of algorithmic leveraging samples and rescales…

Methodology · Statistics 2013-06-25 Ping Ma , Michael W. Mahoney , Bin Yu

Revisiting Approximate Leverage Score Sketching for Matrix Least Squares

We revisit the problem of sketching using approximate leverage scores for matrix least squares problems of the form $\| AX - B \|_F^2$ where the design matrix $A \in \mathbb{R}^{N \times r}$ is tall and skinny with $N \gg r$. We derive the…

Numerical Analysis · Mathematics 2026-03-31 Brett W. Larsen , Tamara G. Kolda

On Fast Leverage Score Sampling and Optimal Learning

Leverage score sampling provides an appealing way to perform approximate computations for large matrices. Indeed, it allows to derive faithful approximations with a complexity adapted to the problem at hand. Yet, performing leverage scores…

Machine Learning · Statistics 2019-01-25 Alessandro Rudi , Daniele Calandriello , Luigi Carratino , Lorenzo Rosasco

Gradient Coding with Iterative Block Leverage Score Sampling

We generalize the leverage score sampling sketch for $\ell_2$-subspace embeddings, to accommodate sampling subsets of the transformed data, so that the sketching approach is appropriate for distributed settings. This is then used to derive…

Information Theory · Computer Science 2024-06-27 Neophytos Charalambides , Mert Pilanci , Alfred Hero

Leveraging Discarded Samples for Tighter Estimation of Multiple-Set Aggregates

Many datasets such as market basket data, text or hypertext documents, and sensor observations recorded in different locations or time periods, are modeled as a collection of sets over a ground set of keys. We are interested in basic…

Databases · Computer Science 2009-03-05 Edith Cohen , Haim Kaplan

Iterative Row Sampling

There has been significant interest and progress recently in algorithms that solve regression problems involving tall and thin matrices in input sparsity time. These algorithms find shorter equivalent of a n*d matrix where n >> d, which…

Data Structures and Algorithms · Computer Science 2013-04-05 Mu Li , Gary L. Miller , Richard Peng

On the selection of optimal subdata for big data regression based on leverage scores

The demand of computational resources for the modeling process increases as the scale of the datasets does, since traditional approaches for regression involve inverting huge data matrices. The main problem relies on the large data size,…

Methodology · Statistics 2023-07-06 Vasilis Chasiotis , Dimitris Karlis

Learning by mirror averaging

Given a finite collection of estimators or classifiers, we study the problem of model selection type aggregation, that is, we construct a new estimator or classifier, called aggregate, which is nearly as good as the best among them with…

Statistics Theory · Mathematics 2008-11-10 A. Juditsky , P. Rigollet , A. B. Tsybakov

LSAR: Efficient Leverage Score Sampling Algorithm for the Analysis of Big Time Series Data

We apply methods from randomized numerical linear algebra (RandNLA) to develop improved algorithms for the analysis of large-scale time series data. We first develop a new fast algorithm to estimate the leverage scores of an autoregressive…

Methodology · Statistics 2021-11-02 Ali Eshragh , Fred Roosta , Asef Nazari , Michael W. Mahoney

Iterative estimating equations: Linear convergence and asymptotic properties

We propose an iterative estimating equations procedure for analysis of longitudinal data. We show that, under very mild conditions, the probability that the procedure converges at an exponential rate tends to one as the sample size…

Statistics Theory · Mathematics 2007-12-18 Jiming Jiang , Yihui Luan , You-Gan Wang

Fast algorithms for least square problems with Kronecker lower subsets

While leverage score sampling provides powerful tools for approximating solutions to large least squares problems, the cost of computing exact scores and sampling often prohibits practical application. This paper addresses this challenge by…

Numerical Analysis · Mathematics 2025-04-29 Osman Asif Malik , Yiming Xu , Nuojin Cheng , Stephen Becker , Alireza Doostan , Akil Narayan

A Provably Accurate Randomized Sampling Algorithm for Logistic Regression

In statistics and machine learning, logistic regression is a widely-used supervised learning technique primarily employed for binary classification tasks. When the number of observations greatly exceeds the number of predictor variables, we…

Machine Learning · Statistics 2024-04-02 Agniva Chowdhury , Pradeep Ramuhalli

Estimating leverage scores via rank revealing methods and randomization

We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank. Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized…

Data Structures and Algorithms · Computer Science 2022-03-08 Aleksandros Sobczyk , Efstratios Gallopoulos

An Aggregation Technique For Large-Scale PEPA Models With Non-Uniform Populations

Performance analysis based on modelling consists of two major steps: model construction and model analysis. Formal modelling techniques significantly aid model construction but can exacerbate model analysis. In particular, here we consider…

Performance · Computer Science 2013-09-09 Alireza Pourranjbar , Jane Hillston

Starting Small -- Learning with Adaptive Sample Sizes

For many machine learning problems, data is abundant and it may be prohibitive to make multiple passes through the full training set. In this context, we investigate strategies for dynamically increasing the effective sample size, when…

Machine Learning · Computer Science 2016-10-10 Hadi Daneshmand , Aurelien Lucchi , Thomas Hofmann

Iterative Averaging in the Quest for Best Test Error

We analyse and explain the increased generalisation performance of iterate averaging using a Gaussian process perturbation model between the true and batch risk surface on the high dimensional quadratic. We derive three phenomena…

Machine Learning · Statistics 2021-11-02 Diego Granziol , Xingchen Wan , Samuel Albanie , Stephen Roberts

Aggregation for Regression Learning

This paper studies statistical aggregation procedures in regression setting. A motivating factor is the existence of many different methods of estimation, leading to possibly competing estimators. We consider here three different types of…

Statistics Theory · Mathematics 2007-06-13 Florentina Bunea , Alexandre Tsybakov , Marten Wegkamp

Tighter Low-rank Approximation via Sampling the Leveraged Element

In this work, we propose a new randomized algorithm for computing a low-rank approximation to a given matrix. Taking an approach different from existing literature, our method first involves a specific biased sampling, with an element being…

Data Structures and Algorithms · Computer Science 2014-10-16 Srinadh Bhojanapalli , Prateek Jain , Sujay Sanghavi

Randomized maximum-contrast selection: subagging for large-scale regression

We introduce a very general method for sparse and large-scale variable selection. The large-scale regression settings is such that both the number of parameters and the number of samples are extremely large. The proposed method is based on…

Statistics Theory · Mathematics 2019-07-31 Jelena Bradic

On Sampling Random Features From Empirical Leverage Scores: Implementation and Theoretical Guarantees

Random features provide a practical framework for large-scale kernel approximation and supervised learning. It has been shown that data-dependent sampling of random features using leverage scores can significantly reduce the number of…

Machine Learning · Computer Science 2019-03-21 Shahin Shahrampour , Soheil Kolouri