Related papers: An investigation into the Multiple Optimised Param…
We show how the massive data compression algorithm MOPED can be used to reduce, by orders of magnitude, the number of simulated datasets that are required to estimate the covariance matrix required for the analysis of gaussian-distributed…
In many fields including cosmology, statistical inference often relies on Gaussian likelihoods whose covariance matrices are estimated from a finite number of simulations. This finite-sample estimation introduces noise into the covariance,…
We develop extreme data compression for use in Bayesian model comparison via the MOPED algorithm, as well as more general score compression. We find that Bayes factors from data compressed with the MOPED algorithm are identical to those…
Bringing a high-dimensional dataset into science-ready shape is a formidable challenge that often necessitates data compression. Compression has accordingly become a key consideration for contemporary cosmology, affecting public data…
We formulate the problem of performing optimal data compression under the constraints that compressed data can be used for accurate classification in machine learning. We show that this translates to a problem of minimizing the mutual…
The problem of monotone missing data has been broadly studied during the last two decades and has many applications in different fields such as bioinformatics or statistics. Commonly used imputation techniques require multiple iterations…
Covariance matrices are among the most difficult pieces of end-to-end cosmological analyses. In principle, for two-point functions, each component involves a four-point function, and the resulting covariance often has hundreds of thousands…
Embedding techniques have become essential components of large databases in the deep learning era. By encoding discrete entities, such as words, items, or graph nodes, into continuous vector spaces, embeddings facilitate more efficient…
In this paper, we propose a maximum smoothed likelihood method to estimate the component density functions of mixture models, in which the mixing proportions are known and may differ among observations. The proposed estimates maximize a…
Machine Learning models should ideally be compact and robust. Compactness provides efficiency and comprehensibility whereas robustness provides resilience. Both topics have been studied in recent years but in isolation. Here we present a…
Mixture of Experts (MoE) are successful models for modeling heterogeneous data in many statistical learning problems including regression, clustering and classification. Generally fitted by maximum likelihood estimation via the well-known…
Non-negative matrix factorization (NMF) is one of the most popular decomposition techniques for multivariate data. NMF is a core method for many machine-learning related computational problems, such as data compression, feature extraction,…
The method of maximum likelihood estimation (MLE) is a widely used statistical approach for estimating the values of one or more unknown parameters of a probabilistic model based on observed data. In this tutorial, I briefly review the…
This paper presents a novel non-linear model reduction method: Probabilistic Manifold Decomposition (PMD), which provides a powerful framework for constructing non-intrusive reduced-order models (ROMs) by embedding a high-dimensional system…
Linear models are used in online decision making, such as in machine learning, policy algorithms, and experimentation platforms. Many engineering systems that use linear models achieve computational efficiency through distributed systems…
We present a novel compression algorithm for reducing the storage of LiDAR sensor data streams. Our model exploits spatio-temporal relationships across multiple LiDAR sweeps to reduce the bitrate of both geometry and intensity values.…
Modern applied optimization problems become more and more complex every day. Due to this fact, distributed algorithms that can speed up the process of solving an optimization problem through parallelization are of great importance. The main…
In this paper, we consider distributed maximum likelihood estimation (MLE) with dependent quantized data under the assumption that the structure of the joint probability density function (pdf) is known, but it contains unknown deterministic…
We give an algorithm that learns a representation of data through compression. The algorithm 1) predicts bits sequentially from those previously seen and 2) has a structure and a number of computations similar to an autoencoder. The…
Estimating statistical models within sensor networks requires distributed algorithms, in which both data and computation are distributed across the nodes of the network. We propose a general approach for distributed learning based on…