English

Differentiable Architecture Pruning for Transfer Learning

Machine Learning 2021-07-08 v1 Computer Vision and Pattern Recognition Machine Learning

Abstract

We propose a new gradient-based approach for extracting sub-architectures from a given large model. Contrarily to existing pruning methods, which are unable to disentangle the network architecture and the corresponding weights, our architecture-pruning scheme produces transferable new structures that can be successfully retrained to solve different tasks. We focus on a transfer-learning setup where architectures can be trained on a large data set but very few data points are available for fine-tuning them on new tasks. We define a new gradient-based algorithm that trains architectures of arbitrarily low complexity independently from the attached weights. Given a search space defined by an existing large neural model, we reformulate the architecture search task as a complexity-penalized subset-selection problem and solve it through a two-temperature relaxation scheme. We provide theoretical convergence guarantees and validate the proposed transfer-learning strategy on real data.

Keywords

Cite

@article{arxiv.2107.03375,
  title  = {Differentiable Architecture Pruning for Transfer Learning},
  author = {Nicolo Colombo and Yang Gao},
  journal= {arXiv preprint arXiv:2107.03375},
  year   = {2021}
}

Comments

19 pages (main + appendix), 7 figures and 1 table, Workshop @ ICML 2021, 24th July 2021