Model-Parallel Model Selection for Deep Learning Systems

Kabir Nagrecha

doi:10.1145/3448016.3450571

Model-Parallel Model Selection for Deep Learning Systems

Distributed, Parallel, and Cluster Computing 2021-07-15 v1 Machine Learning

Authors: Kabir Nagrecha

View on arXiv ↗ PDF ↗ DOI ↗

Abstract

As deep learning becomes more expensive, both in terms of time and compute, inefficiencies in machine learning (ML) training prevent practical usage of state-of-the-art models for most users. The newest model architectures are simply too large to be fit onto a single processor. To address the issue, many ML practitioners have turned to model parallelism as a method of distributing the computational requirements across several devices. Unfortunately, the sequential nature of neural networks causes very low efficiency and device utilization in model parallel training jobs. We propose a new form of "shard parallelism" combining task and model parallelism, then package it into a framework we name Hydra. Hydra recasts the problem of model parallelism in the multi-model context to produce a fine-grained parallel workload of independent model shards, rather than independent models. This new parallel design promises dramatic speedups relative to the traditional model parallelism paradigm.

Keywords

parallel algorithm parallel programming deep learning

Cite

@article{arxiv.2107.06469,
  title  = {Model-Parallel Model Selection for Deep Learning Systems},
  author = {Kabir Nagrecha},
  journal= {arXiv preprint arXiv:2107.06469},
  year   = {2021}
}

Comments

2 pages, 3 figures. 1st place winner of ACM SIGMOD '21 Student Research Competition. Appeared in ACM SIGMOD/PODS '21 Proceedings

Model-Parallel Model Selection for Deep Learning Systems

Abstract

Keywords

Cite

Comments

Related papers