As deep learning becomes more expensive, both in terms of time and compute, inefficiencies in machine learning (ML) training prevent practical usage of state-of-the-art models for most users. The newest model architectures are simply too large to be fit onto a single processor. To address the issue, many ML practitioners have turned to model parallelism as a method of distributing the computational requirements across several devices. Unfortunately, the sequential nature of neural networks causes very low efficiency and device utilization in model parallel training jobs. We propose a new form of "shard parallelism" combining task and model parallelism, then package it into a framework we name Hydra. Hydra recasts the problem of model parallelism in the multi-model context to produce a fine-grained parallel workload of independent model shards, rather than independent models. This new parallel design promises dramatic speedups relative to the traditional model parallelism paradigm.
@article{arxiv.2107.06469,
title = {Model-Parallel Model Selection for Deep Learning Systems},
author = {Kabir Nagrecha},
journal= {arXiv preprint arXiv:2107.06469},
year = {2021}
}
Comments
2 pages, 3 figures. 1st place winner of ACM SIGMOD '21 Student Research Competition. Appeared in ACM SIGMOD/PODS '21 Proceedings