Generalized massive optimal data compression

Justin Alsing; Benjamin Wandelt

doi:10.1093/mnrasl/sly029

Generalized massive optimal data compression

Cosmology and Nongalactic Astrophysics 2018-04-04 v2

Authors: Justin Alsing , Benjamin Wandelt

View on arXiv ↗ PDF ↗ DOI ↗

Abstract

Data compression has become one of the cornerstones of modern astronomical data analysis, with the vast majority of analyses compressing large raw datasets down to a manageable number of informative summaries. In this paper we provide a general procedure for optimally compressing $N$ data down to $n$ summary statistics, where $n$ is equal to the number of parameters of interest. We show that compression to the score function -- the gradient of the log-likelihood with respect to the parameters -- yields $n$ compressed statistics that are optimal in the sense that they preserve the Fisher information content of the data. Our method generalizes earlier work on linear Karhunen-Lo\'{e}ve compression for Gaussian data whilst recovering both lossless linear compression and quadratic estimation as special cases when they are optimal. We give a unified treatment that also includes the general non-Gaussian case as long as mild regularity conditions are satisfied, producing optimal non-linear summary statistics when appropriate. As a worked example, we derive explicitly the $n$ optimal compressed statistics for Gaussian data in the general case where both the mean and covariance depend on the parameters.

Keywords

gaussian processes statistical algorithms

Cite

@article{arxiv.1712.00012,
  title  = {Generalized massive optimal data compression},
  author = {Justin Alsing and Benjamin Wandelt},
  journal= {arXiv preprint arXiv:1712.00012},
  year   = {2018}
}

Comments

5 pages; updated to MNRAS Letters accepted version (3 Apr 2018)

Generalized massive optimal data compression

Abstract

Keywords

Cite

Comments

Related papers