Related papers: Statistical inference for sketching algorithms
Sketching is a probabilistic data compression technique that has been largely developed in the computer science community. Numerical operations on big datasets can be intolerably slow; sketching algorithms address this issue by generating a…
Randomized algorithms, such as randomized sketching or stochastic optimization, are a promising approach to ease the computational burden in analyzing large datasets. However, randomized algorithms also produce non-deterministic outputs,…
Researchers may perform regressions using a sketch of data of size $m$ instead of the full sample of size $n$ for a variety of reasons. This paper considers the case when the regression errors do not have constant variance and…
Matrix sketching is a powerful tool for reducing the size of large data matrices. Yet there are fundamental limitations to this size reduction when we want to recover an accurate estimator for a task such as least square regression. We show…
Linear regression is a classic method of data analysis. In recent years, sketching -- a method of dimension reduction using random sampling, random projections, or both -- has gained popularity as an effective computational approximation…
Randomized algorithms can be used to speed up the analysis of large datasets. In this paper, we develop a unified methodology for statistical inference via randomized sketching or projections in two of the most fundamental problems in…
We consider statistical as well as algorithmic aspects of solving large-scale least-squares (LS) problems using randomized sketching algorithms. For a LS problem with input data $(X, Y) \in \mathbb{R}^{n \times p} \times \mathbb{R}^n$,…
Sketching and stochastic gradient methods are arguably the most common techniques to derive efficient large scale learning algorithms. In this paper, we investigate their application in the context of nonparametric statistical learning.…
A methodology for using random sketching in the context of model order reduction for high-dimensional parameter-dependent systems of equations was introduced in [Balabanov and Nouy 2019, Part I]. Following this framework, we here construct…
In this work, we study distributed sketching methods for large scale regression problems. We leverage multiple randomized sketches for reducing the problem dimensions as well as preserving privacy and improving straggler resilience in…
Random projections or sketching are widely used in many algorithmic and learning contexts. Here we study the performance of iterative Hessian sketch for least-squares problems. By leveraging and extending recent results from random matrix…
The Count-Min sketch is an important and well-studied data summarization method. It allows one to estimate the count of any item in a stream using a small, fixed size data sketch. However, the accuracy of the sketch depends on…
Accounting for the uncertainty in the predictions of modern neural networks is a challenging and important task in many domains. Existing algorithms for uncertainty estimation require modifying the model architecture and training procedure…
Estimating frequencies of elements appearing in a data stream is a key task in large-scale data analysis. Popular sketching approaches to this problem (e.g., CountMin and CountSketch) come with worst-case guarantees that probabilistically…
We consider a least squares regression problem where the data has been generated from a linear model, and we are interested to learn the unknown regression parameters. We consider "sketch-and-solve" methods that randomly project the data…
We consider statistical and algorithmic aspects of solving large-scale least-squares (LS) problems using randomized sketching algorithms. Prior results show that, from an \emph{algorithmic perspective}, when using sketching matrices…
This work considers the problem of learning the Markov parameters of a linear system from observed data. Recent non-asymptotic system identification results have characterized the sample complexity of this problem in the single and…
Learning parameters from voluminous data can be prohibitive in terms of memory and computational requirements. We propose a "compressive learning" framework where we estimate model parameters from a sketch of the training data. This sketch…
We introduce a technique for estimating a structured covariance matrix from observations of a random vector which have been sketched. Each observed random vector $\boldsymbol{x}_t$ is reduced to a single number by taking its inner product…
This paper develops conformal inference methods to construct a confidence interval for the frequency of a queried object in a very large discrete data set, based on a sketch with a lower memory footprint. This approach requires no knowledge…