English

Linearized Relative Positional Encoding

Computation and Language 2023-07-19 v1

Abstract

Relative positional encoding is widely used in vanilla and linear transformers to represent positional information. However, existing encoding methods of a vanilla transformer are not always directly applicable to a linear transformer, because the latter requires a decomposition of the query and key representations into separate kernel functions. Nevertheless, principles for designing encoding methods suitable for linear transformers remain understudied. In this work, we put together a variety of existing linear relative positional encoding approaches under a canonical form and further propose a family of linear relative positional encoding algorithms via unitary transformation. Our formulation leads to a principled framework that can be used to develop new relative positional encoding methods that preserve linear space-time complexity. Equipped with different models, the proposed linearized relative positional encoding (LRPE) family derives effective encoding for various applications. Experiments show that compared with existing methods, LRPE achieves state-of-the-art performance in language modeling, text classification, and image classification. Meanwhile, it emphasizes a general paradigm for designing broadly more relative positional encoding methods that are applicable to linear transformers. The code is available at https://github.com/OpenNLPLab/Lrpe.

Keywords

Cite

@article{arxiv.2307.09270,
  title  = {Linearized Relative Positional Encoding},
  author = {Zhen Qin and Weixuan Sun and Kaiyue Lu and Hui Deng and Dongxu Li and Xiaodong Han and Yuchao Dai and Lingpeng Kong and Yiran Zhong},
  journal= {arXiv preprint arXiv:2307.09270},
  year   = {2023}
}

Comments

Reviewed by TMLR, decision pending. Yiran Zhong is the corresponding author. Code is available at https://github.com/OpenNLPLab/Lrpe