English

A Scalable and Cloud-Native Hyperparameter Tuning System

Distributed, Parallel, and Cluster Computing 2020-06-11 v2 Machine Learning

Abstract

In this paper, we introduce Katib: a scalable, cloud-native, and production-ready hyperparameter tuning system that is agnostic of the underlying machine learning framework. Though there are multiple hyperparameter tuning systems available, this is the first one that caters to the needs of both users and administrators of the system. We present the motivation and design of the system and contrast it with existing hyperparameter tuning systems, especially in terms of multi-tenancy, scalability, fault-tolerance, and extensibility. It can be deployed on local machines, or hosted as a service in on-premise data centers, or in private/public clouds. We demonstrate the advantage of our system using experimental results as well as real-world, production use cases. Katib has active contributors from multiple companies and is open-sourced at \emph{https://github.com/kubeflow/katib} under the Apache 2.0 license.

Keywords

Cite

@article{arxiv.2006.02085,
  title  = {A Scalable and Cloud-Native Hyperparameter Tuning System},
  author = {Johnu George and Ce Gao and Richard Liu and Hou Gang Liu and Yuan Tang and Ramdoot Pydipaty and Amit Kumar Saha},
  journal= {arXiv preprint arXiv:2006.02085},
  year   = {2020}
}

Comments

Fixed some typos, no content change at all from previous version

R2 v1 2026-06-23T16:01:05.401Z