We propose a new gradient-based approach for extracting sub-architectures from a given large model. Contrarily to existing pruning methods, which are unable to disentangle the network architecture and the corresponding weights, our architecture-pruning scheme produces transferable new structures that can be successfully retrained to solve different tasks. We focus on a transfer-learning setup where architectures can be trained on a large data set but very few data points are available for fine-tuning them on new tasks. We define a new gradient-based algorithm that trains architectures of arbitrarily low complexity independently from the attached weights. Given a search space defined by an existing large neural model, we reformulate the architecture search task as a complexity-penalized subset-selection problem and solve it through a two-temperature relaxation scheme. We provide theoretical convergence guarantees and validate the proposed transfer-learning strategy on real data.
@article{arxiv.2107.03375,
title = {Differentiable Architecture Pruning for Transfer Learning},
author = {Nicolo Colombo and Yang Gao},
journal= {arXiv preprint arXiv:2107.03375},
year = {2021}
}
Comments
19 pages (main + appendix), 7 figures and 1 table, Workshop @ ICML 2021, 24th July 2021