AutoGMM: Automatic Gaussian Mixture Modeling in Python
Abstract
The exponential growth of complex data demands fully automatic clustering. Gaussian mixture models (GMMs) provide uncertainty-aware grouping but often require expertise to specify hyperparameters, e.g., component count and covariance structure. While mclust (R) automates this via Bayesian Information Criterion (BIC), Python lacks a comparable tool. We introduce AutoGMM, an open-source Python package automating GMM via strategic initialization using an agglomerative Mahalanobis heuristic, and parallelized model selection by information criteria. AutoGMM is a drop-in tool that yields strong out-of-the-box performance on classic benchmarks, targeted stress tests, and two real datasets, with favorable runtime scaling. The code is available at https://github.com/neurodata/AutoGMM with tests and reproducible workflows.
Cite
@article{arxiv.1909.02688,
title = {AutoGMM: Automatic Gaussian Mixture Modeling in Python},
author = {Tingshan Liu and Thomas L. Athey and Benjamin D. Pedigo and Joshua T. Vogelstein},
journal= {arXiv preprint arXiv:1909.02688},
year = {2025}
}