English

Model-Parallel Model Selection for Deep Learning Systems

Distributed, Parallel, and Cluster Computing 2021-07-15 v1 Machine Learning

Abstract

As deep learning becomes more expensive, both in terms of time and compute, inefficiencies in machine learning (ML) training prevent practical usage of state-of-the-art models for most users. The newest model architectures are simply too large to be fit onto a single processor. To address the issue, many ML practitioners have turned to model parallelism as a method of distributing the computational requirements across several devices. Unfortunately, the sequential nature of neural networks causes very low efficiency and device utilization in model parallel training jobs. We propose a new form of "shard parallelism" combining task and model parallelism, then package it into a framework we name Hydra. Hydra recasts the problem of model parallelism in the multi-model context to produce a fine-grained parallel workload of independent model shards, rather than independent models. This new parallel design promises dramatic speedups relative to the traditional model parallelism paradigm.

Keywords

Cite

@article{arxiv.2107.06469,
  title  = {Model-Parallel Model Selection for Deep Learning Systems},
  author = {Kabir Nagrecha},
  journal= {arXiv preprint arXiv:2107.06469},
  year   = {2021}
}

Comments

2 pages, 3 figures. 1st place winner of ACM SIGMOD '21 Student Research Competition. Appeared in ACM SIGMOD/PODS '21 Proceedings