English
Related papers

Related papers: Compressing Large Sample Data for Discriminant Ana…

200 papers

This article considers "compressive learning," an approach to large-scale machine learning where datasets are massively compressed before learning (e.g., clustering, classification, or regression) is performed. In particular, a "sketch" is…

Sketching is a probabilistic data compression technique that has been largely developed in the computer science community. Numerical operations on big datasets can be intolerably slow; sketching algorithms address this issue by generating a…

Methodology · Statistics 2019-04-04 Daniel Ahfock , William J. Astle , Sylvia Richardson

Matrix sketching is a recently developed data compression technique. An input matrix A is efficiently approximated with a smaller matrix B, so that B preserves most of the properties of A up to some guaranteed approximation ratio. In so…

Machine Learning · Statistics 2019-12-03 Roberta Falcone , Angela Montanari , Laura Anderlucci

Varying coefficient models are popular for estimating nonlinear regression functions in functional data models. Their Bayesian variants have received limited attention in large data applications, primarily due to prohibitively slow…

Machine Learning · Statistics 2025-06-03 Rajarshi Guhaniyogi , Laura Baracaldo , Sudipto Banerjee

Bayesian computation of high dimensional linear regression models with a popular Gaussian scale mixture prior distribution using Markov Chain Monte Carlo (MCMC) or its variants can be extremely slow or completely prohibitive due to the…

Methodology · Statistics 2021-05-12 Rajarshi Guhaniyogi , Aaron Scheffler

As an alternative to variable selection or shrinkage in high dimensional regression, we propose to randomly compress the predictors prior to analysis. This dramatically reduces storage and computational bottlenecks, performing well when the…

Machine Learning · Statistics 2013-03-26 Rajarshi Guhaniyogi , David B. Dunson

Subsampling is a computationally efficient and scalable method to draw inference in large data settings based on a subset of the data rather than needing to consider the whole dataset. When employing subsampling techniques, a crucial…

Methodology · Statistics 2025-10-08 Amalan Mahendran , Helen Thompson , James M. McGree

We describe a general framework -- compressive statistical learning -- for resource-efficient large-scale learning: the training collection is compressed in one pass into a low-dimensional sketch (a vector of random empirical generalized…

Machine Learning · Statistics 2021-06-23 Rémi Gribonval , Gilles Blanchard , Nicolas Keriven , Yann Traonmilin

Supervised learning under measurement constraints is a common challenge in statistical and machine learning. In many applications, despite extensive design points, acquiring responses for all points is often impractical due to resource…

Methodology · Statistics 2025-03-19 Lin Wang

In this paper, we revisit the large-scale constrained linear regression problem and propose faster methods based on some recent developments in sketching and optimization. Our algorithms combine (accelerated) mini-batch SGD with a new…

Machine Learning · Computer Science 2018-02-12 Di Wang , Jinhui Xu

This survey highlights the recent advances in algorithms for numerical linear algebra that have come from the technique of linear sketching, whereby given a matrix, one first compresses it to a much smaller matrix by multiplying it by a…

Data Structures and Algorithms · Computer Science 2015-02-11 David P. Woodruff

Modern statistical analysis often encounters datasets with large sizes. For these datasets, conventional estimation methods can hardly be used immediately because practitioners often suffer from limited computational resources. In most…

Methodology · Statistics 2023-04-14 Shuyuan Wu , Xuening Zhu , Hansheng Wang

In the compressive learning theory, instead of solving a statistical learning problem from the input data, a so-called sketch is computed from the data prior to learning. The sketch has to capture enough information to solve the problem…

Machine Learning · Statistics 2019-10-23 Michael P. Sheehan , Antoine Gonon , Mike E. Davies

We propose a compressive classification framework for settings where the data dimensionality is significantly higher than the sample size. The proposed method, referred to as compressive regularized discriminant analysis (CRDA) is based on…

Machine Learning · Statistics 2020-11-13 Muhammad Naveed Tabassum , Esa Ollila

Learning parameters from voluminous data can be prohibitive in terms of memory and computational requirements. We propose a "compressive learning" framework where we estimate model parameters from a sketch of the training data. This sketch…

Machine Learning · Computer Science 2017-05-08 Nicolas Keriven , Anthony Bourrier , Rémi Gribonval , Patrick Pérez

For many machine learning problems, data is abundant and it may be prohibitive to make multiple passes through the full training set. In this context, we investigate strategies for dynamically increasing the effective sample size, when…

Machine Learning · Computer Science 2016-10-10 Hadi Daneshmand , Aurelien Lucchi , Thomas Hofmann

The excellent performance of deep neural networks is usually accompanied by a large number of parameters and computations, which have limited their usage on the resource-limited edge devices. To address this issue, abundant methods such as…

Computer Vision and Pattern Recognition · Computer Science 2023-05-23 Muzhou Yu , Linfeng Zhang , Kaisheng Ma

Almost every software system provides configuration options to tailor the system to the target platform and application scenario. Often, this configurability renders the analysis of every individual system configuration infeasible. To…

Software Engineering · Computer Science 2016-02-17 Flávio Medeiros , Christian Kästner , Márcio Ribeiro , Rohit Gheyi , Sven Apel

The immense amount of daily generated and communicated data presents unique challenges in their processing. Clustering, the grouping of data without the presence of ground-truth labels, is an important tool for drawing inferences from data.…

Machine Learning · Statistics 2018-02-08 Panagiotis A. Traganitis , Georgios B. Giannakis

A major challenge for building statistical models in the big data era is that the available data volume far exceeds the computational capability. A common approach for solving this problem is to employ a subsampled dataset that can be…

Computation · Statistics 2018-09-14 Lei Han , Kean Ming Tan , Ting Yang , Tong Zhang
‹ Prev 1 2 3 10 Next ›