English

Dual PatchNorm

Computer Vision and Pattern Recognition 2023-05-09 v3 Machine Learning

Abstract

We propose Dual PatchNorm: two Layer Normalization layers (LayerNorms), before and after the patch embedding layer in Vision Transformers. We demonstrate that Dual PatchNorm outperforms the result of exhaustive search for alternative LayerNorm placement strategies in the Transformer block itself. In our experiments, incorporating this trivial modification, often leads to improved accuracy over well-tuned Vision Transformers and never hurts.

Cite

@article{arxiv.2302.01327,
  title  = {Dual PatchNorm},
  author = {Manoj Kumar and Mostafa Dehghani and Neil Houlsby},
  journal= {arXiv preprint arXiv:2302.01327},
  year   = {2023}
}

Comments

TMLR 2023 (https://openreview.net/forum?id=jgMqve6Qhw)

R2 v1 2026-06-28T08:30:41.429Z