English
Related papers

Related papers: An Econometric Perspective on Algorithmic Subsampl…

200 papers

Sketching is a probabilistic data compression technique that has been largely developed in the computer science community. Numerical operations on big datasets can be intolerably slow; sketching algorithms address this issue by generating a…

Methodology · Statistics 2019-04-04 Daniel Ahfock , William J. Astle , Sylvia Richardson

Given a database, computing the fraction of rows that contain a query itemset or determining whether this fraction is above some threshold are fundamental operations in data mining. A uniform sample of rows is a good sketch of the database…

Data Structures and Algorithms · Computer Science 2016-03-10 Edo Liberty , Michael Mitzenmacher , Justin Thaler , Jonathan Ullman

Recent advancement of the WWW, IOT, social network, e-commerce, etc. have generated a large volume of data. These datasets are mostly represented by high dimensional and sparse datasets. Many fundamental subroutines of common data analytic…

Information Retrieval · Computer Science 2019-10-11 Rameshwar Pratap , Debajyoti Bera , Karthik Revanuru

Sketching is a powerful dimensionality reduction technique for accelerating algorithms for data analysis. A crucial step in sketching methods is to compute a subspace embedding (SE) for a large matrix $\mathbf{A} \in \mathbb{R}^{N \times…

Data Structures and Algorithms · Computer Science 2021-07-14 Rajesh Jayaram , Alireza Samadian , David P. Woodruff , Peng Ye

Large-sample data became prevalent as data acquisition became cheaper and easier. While a large sample size has theoretical advantages for many statistical methods, it presents computational challenges. Sketching, or compression, is a…

Machine Learning · Statistics 2020-05-11 Alexander F. Lapanowski , Irina Gaynanova

Linear algebraic operations are ubiquitous in engineering applications, and arise often in a variety of fields including statistical signal processing and machine learning. With contemporary large datasets, to perform linear algebraic…

Numerical Analysis · Mathematics 2025-09-24 Neophytos Charalambides , Arya Mazumdar

Matrix sketching is a recently developed data compression technique. An input matrix A is efficiently approximated with a smaller matrix B, so that B preserves most of the properties of A up to some guaranteed approximation ratio. In so…

Machine Learning · Statistics 2019-12-03 Roberta Falcone , Angela Montanari , Laura Anderlucci

Overparametrization often helps improve the generalization performance. This paper presents a dual view of overparametrization suggesting that downsampling may also help generalize. Focusing on the proportional regime $m\asymp n \asymp p$,…

Statistics Theory · Mathematics 2023-10-17 Xin Chen , Yicheng Zeng , Siyue Yang , Qiang Sun

Network datasets appear across a wide range of scientific fields, including biology, physics, and the social sciences. To enable data-driven discoveries from these networks, statistical inference techniques like estimation and hypothesis…

Methodology · Statistics 2026-02-19 Arpan Kumar , Minh Tang , Srijan Sengupta

Researchers may perform regressions using a sketch of data of size $m$ instead of the full sample of size $n$ for a variety of reasons. This paper considers the case when the regression errors do not have constant variance and…

Machine Learning · Statistics 2022-06-23 Sokbae Lee , Serena Ng

Matrix sketching is a powerful tool for reducing the size of large data matrices. Yet there are fundamental limitations to this size reduction when we want to recover an accurate estimator for a task such as least square regression. We show…

Data Structures and Algorithms · Computer Science 2024-05-10 Sachin Garg , Kevin Tan , Michał Dereziński

Linear sketching algorithms have been widely used for processing large-scale distributed and streaming datasets. Their popularity is largely due to the fact that linear sketches can be naturally composed in the distributed model and be…

Data Structures and Algorithms · Computer Science 2017-03-28 Jiecao Chen , Qin Zhang

Sketching algorithms use random projections to generate a smaller sketched data set, often for the purposes of modelling. Complete and partial sketch regression estimates can be constructed using information from only the sketched data set…

Methodology · Statistics 2023-06-07 R. P. Browne , J. L. Andrews

Subsampling algorithms are a natural approach to reduce data size before fitting models on massive datasets. In recent years, several works have proposed methods for subsampling rows from a data matrix while maintaining relevant information…

Machine Learning · Computer Science 2023-01-18 Fred Lu , Edward Raff , James Holt

Datasets with sheer volume have been generated from fields including computer vision, medical imageology, and astronomy whose large-scale and high-dimensional properties hamper the implementation of classical statistical models. To tackle…

Statistics Theory · Mathematics 2023-05-30 Hang Yu , Zhenxing Dou , Zhiwei Chen , Xiaomeng Yan

High-dimensional sparse data present computational and statistical challenges for supervised learning. We propose compact linear sketches for reducing the dimensionality of the input, followed by a single layer neural network. We show that…

Machine Learning · Computer Science 2016-04-21 Amit Daniely , Nevena Lazic , Yoram Singer , Kunal Talwar

There is an increasing body of work exploring the integration of random projection into algorithms for numerical linear algebra. The primary motivation is to reduce the overall computational cost of processing large datasets. A suitably…

Numerical Analysis · Mathematics 2022-01-04 Daniel Ahfock , William J. Astle , Sylvia Richardson

Constrained least squares problems arise in many applications. Their memory and computation costs are expensive in practice involving high-dimensional input data. We employ the so-called "sketching" strategy to project the least squares…

Optimization and Control · Mathematics 2021-09-07 Ke Chen , Ruhui Jin

Random sampling has become a critical tool in solving massive matrix problems. For linear regression, a small, manageable set of data rows can be randomly selected to approximate a tall, skinny data matrix, improving processing time…

Data Structures and Algorithms · Computer Science 2014-08-22 Michael B. Cohen , Yin Tat Lee , Cameron Musco , Christopher Musco , Richard Peng , Aaron Sidford

In the compressive learning theory, instead of solving a statistical learning problem from the input data, a so-called sketch is computed from the data prior to learning. The sketch has to capture enough information to solve the problem…

Machine Learning · Statistics 2019-10-23 Michael P. Sheehan , Antoine Gonon , Mike E. Davies
‹ Prev 1 2 3 10 Next ›