English

Minimal Random Code Learning with Mean-KL Parameterization

Machine Learning 2023-12-05 v2 Machine Learning

Abstract

This paper studies the qualitative behavior and robustness of two variants of Minimal Random Code Learning (MIRACLE) used to compress variational Bayesian neural networks. MIRACLE implements a powerful, conditionally Gaussian variational approximation for the weight posterior QwQ_{\mathbf{w}} and uses relative entropy coding to compress a weight sample from the posterior using a Gaussian coding distribution PwP_{\mathbf{w}}. To achieve the desired compression rate, DKL[QwPw]D_{\mathrm{KL}}[Q_{\mathbf{w}} \Vert P_{\mathbf{w}}] must be constrained, which requires a computationally expensive annealing procedure under the conventional mean-variance (Mean-Var) parameterization for QwQ_{\mathbf{w}}. Instead, we parameterize QwQ_{\mathbf{w}} by its mean and KL divergence from PwP_{\mathbf{w}} to constrain the compression cost to the desired value by construction. We demonstrate that variational training with Mean-KL parameterization converges twice as fast and maintains predictive performance after compression. Furthermore, we show that Mean-KL leads to more meaningful variational distributions with heavier tails and compressed weight samples which are more robust to pruning.

Keywords

Cite

@article{arxiv.2307.07816,
  title  = {Minimal Random Code Learning with Mean-KL Parameterization},
  author = {Jihao Andreas Lin and Gergely Flamich and José Miguel Hernández-Lobato},
  journal= {arXiv preprint arXiv:2307.07816},
  year   = {2023}
}

Comments

ICML Neural Compression Workshop 2023

R2 v1 2026-06-28T11:31:19.085Z