English

DARC: Differentiable ARchitecture Compression

Machine Learning 2019-05-21 v1 Computer Vision and Pattern Recognition Machine Learning

Abstract

In many learning situations, resources at inference time are significantly more constrained than resources at training time. This paper studies a general paradigm, called Differentiable ARchitecture Compression (DARC), that combines model compression and architecture search to learn models that are resource-efficient at inference time. Given a resource-intensive base architecture, DARC utilizes the training data to learn which sub-components can be replaced by cheaper alternatives. The high-level technique can be applied to any neural architecture, and we report experiments on state-of-the-art convolutional neural networks for image classification. For a WideResNet with 97.2%97.2\% accuracy on CIFAR-10, we improve single-sample inference speed by 2.28×2.28\times and memory footprint by 5.64×5.64\times, with no accuracy loss. For a ResNet with 79.15%79.15\% Top1 accuracy on ImageNet, we improve batch inference speed by 1.29×1.29\times and memory footprint by 3.57×3.57\times with 1%1\% accuracy loss. We also give theoretical Rademacher complexity bounds in simplified cases, showing how DARC avoids overfitting despite over-parameterization.

Keywords

Cite

@article{arxiv.1905.08170,
  title  = {DARC: Differentiable ARchitecture Compression},
  author = {Shashank Singh and Ashish Khetan and Zohar Karnin},
  journal= {arXiv preprint arXiv:1905.08170},
  year   = {2019}
}