Related papers: Statistical inference for sketching algorithms

Statistical properties of sketching algorithms

Sketching is a probabilistic data compression technique that has been largely developed in the computer science community. Numerical operations on big datasets can be intolerably slow; sketching algorithms address this issue by generating a…

Methodology · Statistics 2019-04-04 Daniel Ahfock , William J. Astle , Sylvia Richardson

A Framework for Statistical Inference via Randomized Algorithms

Randomized algorithms, such as randomized sketching or stochastic optimization, are a promising approach to ease the computational burden in analyzing large datasets. However, randomized algorithms also produce non-deterministic outputs,…

Methodology · Statistics 2025-05-13 Zhixiang Zhang , Sokbae Lee , Edgar Dobriban

Least Squares Estimation Using Sketched Data with Heteroskedastic Errors

Researchers may perform regressions using a sketch of data of size $m$ instead of the full sample of size $n$ for a variety of reasons. This paper considers the case when the regression errors do not have constant variance and…

Machine Learning · Statistics 2022-06-23 Sokbae Lee , Serena Ng

Distributed Least Squares in Small Space via Sketching and Bias Reduction

Matrix sketching is a powerful tool for reducing the size of large data matrices. Yet there are fundamental limitations to this size reduction when we want to recover an accurate estimator for a task such as least square regression. We show…

Data Structures and Algorithms · Computer Science 2024-05-10 Sachin Garg , Kevin Tan , Michał Dereziński

A Projector-Based Approach to Quantifying Total and Excess Uncertainties for Sketched Linear Regression

Linear regression is a classic method of data analysis. In recent years, sketching -- a method of dimension reduction using random sampling, random projections, or both -- has gained popularity as an effective computational approximation…

Machine Learning · Statistics 2020-08-04 Jocelyn T. Chi , Ilse C. F. Ipsen

Inference in Randomized Least Squares and PCA via Normality of Quadratic Forms

Randomized algorithms can be used to speed up the analysis of large datasets. In this paper, we develop a unified methodology for statistical inference via randomized sketching or projections in two of the most fundamental problems in…

Statistics Theory · Mathematics 2024-04-02 Leda Wang , Zhixiang Zhang , Edgar Dobriban

A Statistical Perspective on Randomized Sketching for Ordinary Least-Squares

We consider statistical as well as algorithmic aspects of solving large-scale least-squares (LS) problems using randomized sketching algorithms. For a LS problem with input data $(X, Y) \in \mathbb{R}^{n \times p} \times \mathbb{R}^n$,…

Machine Learning · Statistics 2015-08-26 Garvesh Raskutti , Michael Mahoney

Learning with SGD and Random Features

Sketching and stochastic gradient methods are arguably the most common techniques to derive efficient large scale learning algorithms. In this paper, we investigate their application in the context of nonparametric statistical learning.…

Machine Learning · Statistics 2019-01-25 Luigi Carratino , Alessandro Rudi , Lorenzo Rosasco

Randomized linear algebra for model reduction. Part II: minimal residual methods and dictionary-based approximation

A methodology for using random sketching in the context of model order reduction for high-dimensional parameter-dependent systems of equations was introduced in [Balabanov and Nouy 2019, Part I]. Following this framework, we here construct…

Numerical Analysis · Mathematics 2022-03-25 Oleg Balabanov , Anthony Nouy

Distributed Sketching Methods for Privacy Preserving Regression

In this work, we study distributed sketching methods for large scale regression problems. We leverage multiple randomized sketches for reducing the problem dimensions as well as preserving privacy and improving straggler resilience in…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-23 Burak Bartan , Mert Pilanci

Optimal Iterative Sketching with the Subsampled Randomized Hadamard Transform

Random projections or sketching are widely used in many algorithmic and learning contexts. Here we study the performance of iterative Hessian sketch for least-squares problems. By leveraging and extending recent results from random matrix…

Optimization and Control · Mathematics 2020-10-26 Jonathan Lacotte , Sifan Liu , Edgar Dobriban , Mert Pilanci

Count-Min: Optimal Estimation and Tight Error Bounds using Empirical Error Distributions

The Count-Min sketch is an important and well-studied data summarization method. It allows one to estimate the count of any item in a stream using a small, fixed size data sketch. However, the accuracy of the sketch depends on…

Data Structures and Algorithms · Computer Science 2018-11-13 Daniel Ting

Scalable computation of prediction intervals for neural networks via matrix sketching

Accounting for the uncertainty in the predictions of modern neural networks is a challenging and important task in many domains. Existing algorithms for uncertainty estimation require modifying the model architecture and training procedure…

Machine Learning · Statistics 2022-05-09 Alexander Fishkov , Maxim Panov

Improved Frequency Estimation Algorithms with and without Predictions

Estimating frequencies of elements appearing in a data stream is a key task in large-scale data analysis. Popular sketching approaches to this problem (e.g., CountMin and CountSketch) come with worst-case guarantees that probabilistically…

Data Structures and Algorithms · Computer Science 2023-12-13 Anders Aamand , Justin Y. Chen , Huy Lê Nguyen , Sandeep Silwal , Ali Vakilian

Asymptotics for Sketching in Least Squares Regression

We consider a least squares regression problem where the data has been generated from a linear model, and we are interested to learn the unknown regression parameters. We consider "sketch-and-solve" methods that randomly project the data…

Statistics Theory · Mathematics 2019-10-08 Edgar Dobriban , Sifan Liu

Statistical and Algorithmic Perspectives on Randomized Sketching for Ordinary Least-Squares -- ICML

We consider statistical and algorithmic aspects of solving large-scale least-squares (LS) problems using randomized sketching algorithms. Prior results show that, from an \emph{algorithmic perspective}, when using sketching matrices…

Machine Learning · Statistics 2015-05-26 Garvesh Raskutti , Michael Mahoney

Learning Linear Models Using Distributed Iterative Hessian Sketching

This work considers the problem of learning the Markov parameters of a linear system from observed data. Recent non-asymptotic system identification results have characterized the sample complexity of this problem in the single and…

Optimization and Control · Mathematics 2021-12-09 Han Wang , James Anderson

Sketching for Large-Scale Learning of Mixture Models

Learning parameters from voluminous data can be prohibitive in terms of memory and computational requirements. We propose a "compressive learning" framework where we estimate model parameters from a sketch of the training data. This sketch…

Machine Learning · Computer Science 2017-05-08 Nicolas Keriven , Anthony Bourrier , Rémi Gribonval , Patrick Pérez

Sketching for Simultaneously Sparse and Low-Rank Covariance Matrices

We introduce a technique for estimating a structured covariance matrix from observations of a random vector which have been sketched. Each observed random vector $\boldsymbol{x}_t$ is reduced to a single number by taking its inner product…

Information Theory · Computer Science 2015-10-09 Sohail Bahmani , Justin Romberg

Conformal Frequency Estimation using Discrete Sketched Data with Coverage for Distinct Queries

This paper develops conformal inference methods to construct a confidence interval for the frequency of a queried object in a very large discrete data set, based on a sketch with a lower memory footprint. This approach requires no knowledge…

Methodology · Statistics 2023-08-17 Matteo Sesia , Stefano Favaro , Edgar Dobriban