English

Learnable Fourier Features for Multi-Dimensional Spatial Positional Encoding

Machine Learning 2021-11-10 v3 Artificial Intelligence Computer Vision and Pattern Recognition

Abstract

Attentional mechanisms are order-invariant. Positional encoding is a crucial component to allow attention-based deep model architectures such as Transformer to address sequences or images where the position of information matters. In this paper, we propose a novel positional encoding method based on learnable Fourier features. Instead of hard-coding each position as a token or a vector, we represent each position, which can be multi-dimensional, as a trainable encoding based on learnable Fourier feature mapping, modulated with a multi-layer perceptron. The representation is particularly advantageous for a spatial multi-dimensional position, e.g., pixel positions on an image, where L2L_2 distances or more complex positional relationships need to be captured. Our experiments based on several public benchmark tasks show that our learnable Fourier feature representation for multi-dimensional positional encoding outperforms existing methods by both improving the accuracy and allowing faster convergence.

Keywords

Cite

@article{arxiv.2106.02795,
  title  = {Learnable Fourier Features for Multi-Dimensional Spatial Positional Encoding},
  author = {Yang Li and Si Si and Gang Li and Cho-Jui Hsieh and Samy Bengio},
  journal= {arXiv preprint arXiv:2106.02795},
  year   = {2021}
}

Comments

35th Conference on Neural Information Processing Systems (NeurIPS 2021)