English
Related papers

Related papers: Generalized massive optimal data compression

200 papers

We present a method for radical linear compression of datasets where the data are dependent on some number $M$ of parameters. We show that, if the noise in the data is independent of the parameters, we can form $M$ linear combinations of…

Astrophysics · Physics 2009-10-31 Alan Heavens , Raul Jimenez , Ofer Lahav

The goal in thinning is to summarize a dataset using a small set of representative points. Remarkably, sub-Gaussian thinning algorithms like Kernel Halving and Compress can match the quality of uniform subsampling while substantially…

Machine Learning · Statistics 2026-03-03 Annabelle Michael Carrell , Albert Gong , Abhishek Shetty , Raaz Dwivedi , Lester Mackey

Nonparametric regression for massive numbers of samples (n) and features (p) is an increasingly important problem. In big n settings, a common strategy is to partition the feature space, and then separately apply simple models to each…

Machine Learning · Statistics 2014-06-10 Rajarshi Guhaniyogi , David B. Dunson

For a collection of distributions over a countable support set, the worst case universal compression formulation by Shtarkov attempts to assign a universal distribution over the support set. The formulation aims to ensure that the universal…

Information Theory · Computer Science 2014-10-17 A. Orlitsky , N. Santhanam

A signature result in compressed sensing is that Gaussian random sampling achieves stable and robust recovery of sparse vectors under optimal conditions on the number of measurements. However, in the context of image reconstruction, it has…

Information Theory · Computer Science 2021-01-26 Ben Adcock , Simone Brugiapaglia , Matthew King-Roskamp

We discuss the statistical properties of a recently introduced unbiased stochastic approximation to the score equations for maximum likelihood calculation for Gaussian processes. Under certain conditions, including bounded condition number…

Applications · Statistics 2013-12-11 Michael L. Stein , Jie Chen , Mihai Anitescu

Modern data analysis frequently involves variables with highly non-Gaussian marginal distributions. However, commonly used analysis methods are most effective with roughly Gaussian data. This paper introduces an automatic transformation…

Methodology · Statistics 2016-01-11 Qing Feng , Jan Hannig , J. S. Marron

Gaussian process regression is a powerful Bayesian nonlinear regression method. Recent research has enabled the capture of many types of observations using non-Gaussian likelihoods. To deal with various tasks in spatial modeling, we benefit…

Machine Learning · Statistics 2025-08-26 Yuta Shikuri

The influx of massive amounts of data from current and upcoming cosmological surveys necessitates compression schemes that can efficiently summarize the data with minimal loss of information. We introduce a method that leverages the…

Cosmology and Nongalactic Astrophysics · Physics 2023-12-18 Aizhan Akhmetzhanova , Siddharth Mishra-Sharma , Cora Dvorkin

Numerical climate model simulations run at high spatial and temporal resolutions generate massive quantities of data. As our computing capabilities continue to increase, storing all of the data is not sustainable, and thus it is important…

Methodology · Statistics 2018-02-20 Joseph Guinness , Dorit Hammerling

Big data is ubiquitous in practices, and it has also led to heavy computation burden. To reduce the calculation cost and ensure the effectiveness of parameter estimators, an optimal subset sampling method is proposed to estimate the…

Methodology · Statistics 2023-11-16 Haohui Han , Liya Fu

A common challenge in estimating parameters of probability density functions is the intractability of the normalizing constant. While in such cases maximum likelihood estimation may be implemented using numerical integration, the approach…

Methodology · Statistics 2018-02-20 Shiqing Yu , Mathias Drton , Ali Shojaie

Recently, several studies proposed non-linear transformations, such as a logarithmic or Gaussianization transformation, as efficient tools to recapture information about the (Gaussian) initial conditions. During non-linear evolution, part…

Cosmology and Nongalactic Astrophysics · Physics 2015-06-16 Julien Carron , Istvan Szapudi

Random projections became popular tools to process big data. In particular, when applied to Nonnegative Matrix Factorization (NMF), it was shown that structured random projections were far more efficient than classical strategies based on…

Signal Processing · Electrical Eng. & Systems 2020-11-13 Farouk Yahaya , Matthieu Puigt , Gilles Delmaire , Gilles Roussel

Consider a Gaussian memoryless multiple source with $m$ components with joint probability distribution known only to lie in a given class of distributions. A subset of $k \leq m$ components are sampled and compressed with the objective of…

Information Theory · Computer Science 2018-03-16 Vinay Praneeth Boda

We present a framework for the theoretical analysis of ensembles of low-complexity empirical risk minimisers trained on independent random compressions of high-dimensional data. First we introduce a general distribution-dependent…

Machine Learning · Computer Science 2021-06-03 Henry W. J. Reeve , Ata Kaban

The modern practice of Radio Astronomy is characterized by extremes of data volume and rates, principally because of the direct relationship between the signal to noise ratio that can be achieved and the need to Nyquist sample the RF…

Information Theory · Computer Science 2014-05-23 Tim Natusch

As computer resources become increasingly limited, traditional statistical methods face challenges in analyzing massive data, especially in functional data analysis. To address this issue, subsampling offers a viable solution by…

Methodology · Statistics 2024-07-01 Jingxiang Pan , Xiaohui Yuan , Xiaohui Yuan

Today, with the growing demands of information storage and data transfer, data compression is becoming increasingly important. Data Compression is a technique which is used to decrease the size of data. This is very useful when some huge…

Information Theory · Computer Science 2025-06-13 Mohammad Hosseini

How much cosmological information can we reliably extract from existing and upcoming large-scale structure observations? Many summary statistics fall short in describing the non-Gaussian nature of the late-time Universe in comparison to…

Cosmology and Nongalactic Astrophysics · Physics 2024-11-15 Kai Lehman , Sven Krippendorf , Jochen Weller , Klaus Dolag
‹ Prev 1 2 3 10 Next ›