English

Vision Transformer Pruning

Computer Vision and Pattern Recognition 2021-08-17 v4

Abstract

Vision transformer has achieved competitive performance on a variety of computer vision applications. However, their storage, run-time memory, and computational demands are hindering the deployment to mobile devices. Here we present a vision transformer pruning approach, which identifies the impacts of dimensions in each layer of transformer and then executes pruning accordingly. By encouraging dimension-wise sparsity in the transformer, important dimensions automatically emerge. A great number of dimensions with small importance scores can be discarded to achieve a high pruning ratio without significantly compromising accuracy. The pipeline for vision transformer pruning is as follows: 1) training with sparsity regularization; 2) pruning dimensions of linear projections; 3) fine-tuning. The reduced parameters and FLOPs ratios of the proposed algorithm are well evaluated and analyzed on ImageNet dataset to demonstrate the effectiveness of our proposed method.

Keywords

Cite

@article{arxiv.2104.08500,
  title  = {Vision Transformer Pruning},
  author = {Mingjian Zhu and Yehui Tang and Kai Han},
  journal= {arXiv preprint arXiv:2104.08500},
  year   = {2021}
}

Comments

Accepted by the KDD 2021 Workshop on Model Mining