Related papers: An investigation into the Multiple Optimised Param…

Massive data compression for parameter-dependent covariance matrices

We show how the massive data compression algorithm MOPED can be used to reduce, by orders of magnitude, the number of simulated datasets that are required to estimate the covariance matrix required for the analysis of gaussian-distributed…

Cosmology and Nongalactic Astrophysics · Physics 2017-10-18 Alan Heavens , Elena Sellentin , Damien de Mijolla , Alvise Vianello

Data Compression with Noise Suppression for Inference under Noisy Covariance

In many fields including cosmology, statistical inference often relies on Gaussian likelihoods whose covariance matrices are estimated from a finite number of simulations. This finite-sample estimation introduces noise into the covariance,…

Cosmology and Nongalactic Astrophysics · Physics 2025-08-20 Sunao Sugiyama , Minsu Park

Extreme data compression for Bayesian model comparison

We develop extreme data compression for use in Bayesian model comparison via the MOPED algorithm, as well as more general score compression. We find that Bayes factors from data compressed with the MOPED algorithm are identical to those…

Instrumentation and Methods for Astrophysics · Physics 2023-07-17 Alan F. Heavens , Arrykrishna Mootoovaloo , Roberto Trotta , Elena Sellentin

Extreme data compression while searching for new physics

Bringing a high-dimensional dataset into science-ready shape is a formidable challenge that often necessitates data compression. Compression has accordingly become a key consideration for contemporary cosmology, affecting public data…

Cosmology and Nongalactic Astrophysics · Physics 2020-08-25 Alan Heavens , Elena Sellentin , Andrew Jaffe

Optimal Compression for Minimizing Classification Error Probability: an Information-Theoretic Approach

We formulate the problem of performing optimal data compression under the constraints that compressed data can be used for accurate classification in machine learning. We show that this translates to a problem of minimizing the mutual…

Signal Processing · Electrical Eng. & Systems 2022-11-04 Jingchao Gao , Ao Tang , Weiyu Xu

EPEM: Efficient Parameter Estimation for Multiple Class Monotone Missing Data

The problem of monotone missing data has been broadly studied during the last two decades and has many applications in different fields such as bioinformatics or statistics. Commonly used imputation techniques require multiple iterations…

Machine Learning · Computer Science 2020-09-25 Thu Nguyen , Duy H. M. Nguyen , Huy Nguyen , Binh T. Nguyen , Bruce A. Wade

Data Compression and Covariance Matrix Inspection: Cosmic Shear

Covariance matrices are among the most difficult pieces of end-to-end cosmological analyses. In principle, for two-point functions, each component involves a four-point function, and the resulting covariance often has hundreds of thousands…

Cosmology and Nongalactic Astrophysics · Physics 2023-04-20 Tassia Ferreira , Tianqing Zhang , Nianyi Chen , Scott Dodelson

Mixed-Precision Embeddings for Large-Scale Recommendation Models

Embedding techniques have become essential components of large databases in the deep learning era. By encoding discrete entities, such as words, items, or graph nodes, into continuous vector spaces, embeddings facilitate more efficient…

Information Retrieval · Computer Science 2024-10-18 Shiwei Li , Zhuoqi Hu , Xing Tang , Haozhao Wang , Shijie Xu , Weihong Luo , Yuhua Li , Xiuqiang He , Ruixuan Li

Maximum Smoothed Likelihood Component Density Estimation in Mixture Models with Known Mixing Proportions

In this paper, we propose a maximum smoothed likelihood method to estimate the component density functions of mixture models, in which the mixing proportions are known and may differ among observations. The proposed estimates maximize a…

Methodology · Statistics 2014-07-14 Tao Yu , Pengfei Li , Jing Qin

Robust Model Compression Using Deep Hypotheses

Machine Learning models should ideally be compact and robust. Compactness provides efficiency and comprehensibility whereas robustness provides resilience. Both topics have been studied in recent years but in isolation. Here we present a…

Machine Learning · Computer Science 2021-03-16 Omri Armstrong , Ran Gilad-Bachrach

Regularized Maximum Likelihood Estimation and Feature Selection in Mixtures-of-Experts Models

Mixture of Experts (MoE) are successful models for modeling heterogeneous data in many statistical learning problems including regression, clustering and classification. Generally fitted by maximum likelihood estimation via the well-known…

Machine Learning · Statistics 2018-10-30 Faicel Chamroukhi , Bao-Tuyen Huynh

An Efficient Algorithm for Non-Negative Matrix Factorization with Random Projections

Non-negative matrix factorization (NMF) is one of the most popular decomposition techniques for multivariate data. NMF is a core method for many machine-learning related computational problems, such as data compression, feature extraction,…

Numerical Analysis · Computer Science 2017-12-07 Gabriele Torre , Michael Graber

Tutorial: Maximum likelihood estimation in the context of an optical measurement

The method of maximum likelihood estimation (MLE) is a widely used statistical approach for estimating the values of one or more unknown parameters of a probabilistic model based on observed data. In this tutorial, I briefly review the…

Data Analysis, Statistics and Probability · Physics 2018-12-03 Anthony Vella

Nonlinear Model Reduction by Probabilistic Manifold Decomposition

This paper presents a novel non-linear model reduction method: Probabilistic Manifold Decomposition (PMD), which provides a powerful framework for constructing non-intrusive reduced-order models (ROMs) by embedding a high-dimensional system…

Numerical Analysis · Mathematics 2026-01-09 Jiaming Guo , Dunhui Xiao

You Only Compress Once: Optimal Data Compression for Estimating Linear Models

Linear models are used in online decision making, such as in machine learning, policy algorithms, and experimentation platforms. Many engineering systems that use linear models achieve computational efficiency through distributed systems…

Machine Learning · Computer Science 2021-03-04 Jeffrey Wong , Eskil Forsell , Randall Lewis , Tobias Mao , Matthew Wardrop

MuSCLE: Multi Sweep Compression of LiDAR using Deep Entropy Models

We present a novel compression algorithm for reducing the storage of LiDAR sensor data streams. Our model exploits spatio-temporal relationships across multiple LiDAR sweeps to reduce the bitrate of both geometry and intensity values.…

Image and Video Processing · Electrical Eng. & Systems 2021-01-12 Sourav Biswas , Jerry Liu , Kelvin Wong , Shenlong Wang , Raquel Urtasun

Real Acceleration of Communication Process in Distributed Algorithms with Compression

Modern applied optimization problems become more and more complex every day. Due to this fact, distributed algorithms that can speed up the process of solving an optimization problem through parallelization are of great importance. The main…

Optimization and Control · Mathematics 2023-12-14 Svetlana Tkachenko , Artem Andreev , Aleksandr Beznosikov , Alexander Gasnikov

Robust Distributed Maximum Likelihood Estimation with Dependent Quantized Data

In this paper, we consider distributed maximum likelihood estimation (MLE) with dependent quantized data under the assumption that the structure of the joint probability density function (pdf) is known, but it contains unknown deterministic…

Information Theory · Computer Science 2013-09-17 Xiaojing Shen , Pramod K. Varshney , Yunmin Zhu

Learning Representations by Maximizing Compression

We give an algorithm that learns a representation of data through compression. The algorithm 1) predicts bits sequentially from those previously seen and 2) has a structure and a number of computations similar to an autoencoder. The…

Computer Vision and Pattern Recognition · Computer Science 2011-08-05 Karol Gregor , Yann LeCun

Distributed Parameter Estimation via Pseudo-likelihood

Estimating statistical models within sensor networks requires distributed algorithms, in which both data and computation are distributed across the nodes of the network. We propose a general approach for distributed learning based on…

Machine Learning · Computer Science 2012-07-03 Qiang Liu , Alexander Ihler