Fast Compute for ML Optimization

Nick Polson; Vadim Sokolov

Fast Compute for ML Optimization

Computation 2026-02-17 v1 Machine Learning

Authors: Nick Polson , Vadim Sokolov

Abstract

We study optimization for losses that admit a variance-mean scale-mixture representation. Under this representation, each EM iteration is a weighted least squares update in which latent variables determine observation and parameter weights; these play roles analogous to Adam's second-moment scaling and AdamW's weight decay, but are derived from the model. The resulting Scale Mixture EM (SM-EM) algorithm removes user-specified learning-rate and momentum schedules. On synthetic ill-conditioned logistic regression benchmarks with $p \in \{20, \ldots, 500\}$ , SM-EM with Nesterov acceleration attains up to $13\times$ lower final loss than Adam tuned by learning-rate grid search. For a 40-point regularization path, sharing sufficient statistics across penalty values yields a $10\times$ runtime reduction relative to the same tuned-Adam protocol. For the base (non-accelerated) algorithm, EM monotonicity guarantees nonincreasing objective values; adding Nesterov extrapolation trades this guarantee for faster empirical convergence.

Keywords

mixture models and em algorithm maximum likelihood estimation sampling algorithms

Cite

@article{arxiv.2602.14280,
  title  = {Fast Compute for ML Optimization},
  author = {Nick Polson and Vadim Sokolov},
  journal= {arXiv preprint arXiv:2602.14280},
  year   = {2026}
}

Fast Compute for ML Optimization

Abstract

Keywords

Cite

Related papers