English

AutoGMM: Automatic Gaussian Mixture Modeling in Python

Machine Learning 2025-09-10 v6 Machine Learning

Abstract

The exponential growth of complex data demands fully automatic clustering. Gaussian mixture models (GMMs) provide uncertainty-aware grouping but often require expertise to specify hyperparameters, e.g., component count and covariance structure. While mclust (R) automates this via Bayesian Information Criterion (BIC), Python lacks a comparable tool. We introduce AutoGMM, an open-source Python package automating GMM via strategic initialization using an agglomerative Mahalanobis heuristic, and parallelized model selection by information criteria. AutoGMM is a drop-in tool that yields strong out-of-the-box performance on classic benchmarks, targeted stress tests, and two real datasets, with favorable runtime scaling. The code is available at https://github.com/neurodata/AutoGMM with tests and reproducible workflows.

Keywords

Cite

@article{arxiv.1909.02688,
  title  = {AutoGMM: Automatic Gaussian Mixture Modeling in Python},
  author = {Tingshan Liu and Thomas L. Athey and Benjamin D. Pedigo and Joshua T. Vogelstein},
  journal= {arXiv preprint arXiv:1909.02688},
  year   = {2025}
}