Related papers: Relative Positional Encoding for Transformers with…

GRPE: Relative Positional Encoding for Graph Transformer

We propose a novel positional encoding for learning graph on Transformer architecture. Existing approaches either linearize a graph to encode absolute position in the sequence of nodes, or encode relative position with another node using…

Machine Learning · Computer Science 2022-10-17 Wonpyo Park , Woonggi Chang , Donggeon Lee , Juntae Kim , Seung-won Hwang

CoPE: A Lightweight Complex Positional Encoding

Recent studies have demonstrated the effectiveness of position encoding in transformer architectures. By incorporating positional information, this approach provides essential guidance for modeling dependencies between elements across…

Machine Learning · Computer Science 2025-08-27 Avinash Amballa

Your Transformer May Not be as Powerful as You Expect

Relative Positional Encoding (RPE), which encodes the relative distance between any pair of tokens, is one of the most successful modifications to the original Transformer. As far as we know, theoretical understanding of the RPE-based…

Machine Learning · Computer Science 2022-10-31 Shengjie Luo , Shanda Li , Shuxin Zheng , Tie-Yan Liu , Liwei Wang , Di He

Linearized Relative Positional Encoding

Relative positional encoding is widely used in vanilla and linear transformers to represent positional information. However, existing encoding methods of a vanilla transformer are not always directly applicable to a linear transformer,…

Computation and Language · Computer Science 2023-07-19 Zhen Qin , Weixuan Sun , Kaiyue Lu , Hui Deng , Dongxu Li , Xiaodong Han , Yuchao Dai , Lingpeng Kong , Yiran Zhong

DAPE: Data-Adaptive Positional Encoding for Length Extrapolation

Positional encoding plays a crucial role in transformers, significantly impacting model performance and length generalization. Prior research has introduced absolute positional encoding (APE) and relative positional encoding (RPE) to…

Computation and Language · Computer Science 2024-11-06 Chuanyang Zheng , Yihang Gao , Han Shi , Minbin Huang , Jingyao Li , Jing Xiong , Xiaozhe Ren , Michael Ng , Xin Jiang , Zhenguo Li , Yu Li

CAPE: Encoding Relative Positions with Continuous Augmented Positional Embeddings

Without positional information, attention-based Transformer neural networks are permutation-invariant. Absolute or relative positional embeddings are the most popular ways to feed Transformer models with positional information. Absolute…

Machine Learning · Computer Science 2021-11-10 Tatiana Likhomanenko , Qiantong Xu , Gabriel Synnaeve , Ronan Collobert , Alex Rogozhnikov

Conditional Positional Encodings for Vision Transformers

We propose a conditional positional encoding (CPE) scheme for vision Transformers. Unlike previous fixed or learnable positional encodings, which are pre-defined and independent of input tokens, CPE is dynamically generated and conditioned…

Computer Vision and Pattern Recognition · Computer Science 2023-02-14 Xiangxiang Chu , Zhi Tian , Bo Zhang , Xinlong Wang , Chunhua Shen

ExPe: Exact Positional Encodings for Generative Transformer Models with Extrapolating Capabilities

This paper introduces a novel approach to position embeddings in transformer models, named "Exact Positional Embeddings" (ExPE). An absolute positional embedding method that can extrapolate to sequences of lengths longer than the ones it…

Computation and Language · Computer Science 2025-10-06 Aleksis Datseris , Sylvia Vassileva , Ivan Koychev , Svetla Boytcheva

Round and Round We Go! What makes Rotary Positional Encodings useful?

Positional Encodings (PEs) are a critical component of Transformer-based Large Language Models (LLMs), providing the attention mechanism with important sequence-position information. One of the most popular types of encoding used today in…

Computation and Language · Computer Science 2025-05-14 Federico Barbero , Alex Vitvitskyi , Christos Perivolaropoulos , Razvan Pascanu , Petar Veličković

Positional Encodings for Light Curve Transformers: Playing with Positions and Attention

We conducted empirical experiments to assess the transferability of a light curve transformer to datasets with different cadences and magnitude distributions using various positional encodings (PEs). We proposed a new approach to…

Instrumentation and Methods for Astrophysics · Physics 2023-08-15 Daniel Moreno-Cartagena , Guillermo Cabrera-Vives , Pavlos Protopapas , Cristobal Donoso-Oliva , Manuel Pérez-Carrasco , Martina Cádiz-Leyton

Rethinking and Improving Relative Position Encoding for Vision Transformer

Relative position encoding (RPE) is important for transformer to capture sequence ordering of input tokens. General efficacy has been proven in natural language processing. However, in computer vision, its efficacy is not well studied and…

Computer Vision and Pattern Recognition · Computer Science 2021-07-30 Kan Wu , Houwen Peng , Minghao Chen , Jianlong Fu , Hongyang Chao

LieRE: Lie Rotational Positional Encodings

Transformer architectures rely on position encodings to model the spatial structure of input data. Rotary Position Encoding (RoPE) is a widely used method in language models that encodes relative positions through fixed, block-diagonal,…

Computer Vision and Pattern Recognition · Computer Science 2025-08-19 Sophie Ostmeier , Brian Axelrod , Maya Varma , Michael E. Moseley , Akshay Chaudhari , Curtis Langlotz

SeqPE: Transformer with Sequential Position Encoding

Since self-attention layers in Transformers are permutation invariant by design, positional encodings must be explicitly incorporated to enable spatial understanding. However, fixed-size lookup tables used in traditional learnable position…

Machine Learning · Computer Science 2025-06-18 Huayang Li , Yahui Liu , Hongyu Sun , Deng Cai , Leyang Cui , Wei Bi , Peilin Zhao , Taro Watanabe

PoPE: Legendre Orthogonal Polynomials Based Position Encoding for Large Language Models

There are several improvements proposed over the baseline Absolute Positional Encoding (APE) method used in original transformer. In this study, we aim to investigate the implications of inadequately representing positional encoding in…

Computation and Language · Computer Science 2024-05-09 Arpit Aggarwal

Give it Space! Explicit Disentangling of Positional and Semantic Representations in Encoders

Positional encoding (PE) underpins how permutation-invariant Transformers represent sequence order, yet how positional information is processed and stored remains poorly understood. Modern PE methods such as RoPE still struggle on tasks…

Computation and Language · Computer Science 2026-05-29 Pierre-Antoine Lequeu , Camille Barboule , Benjamin Piwowarski

An Empirical Study on the Impact of Positional Encoding in Transformer-based Monaural Speech Enhancement

Transformer architecture has enabled recent progress in speech enhancement. Since Transformers are position-agostic, positional encoding is the de facto standard component used to enable Transformers to distinguish the order of elements in…

Audio and Speech Processing · Electrical Eng. & Systems 2024-02-15 Qiquan Zhang , Meng Ge , Hongxu Zhu , Eliathamby Ambikairajah , Qi Song , Zhaoheng Ni , Haizhou Li

Towards More Efficient Insertion Transformer with Fractional Positional Encoding

Auto-regressive neural sequence models have been shown to be effective across text generation tasks. However, their left-to-right decoding order prevents generation from being parallelized. Insertion Transformer (Stern et al., 2019) is an…

Computation and Language · Computer Science 2023-02-01 Zhisong Zhang , Yizhe Zhang , Bill Dolan

PermuteFormer: Efficient Relative Position Encoding for Long Sequences

A recent variation of Transformer, Performer, scales Transformer to longer sequences with a linear attention mechanism. However, it is not compatible with relative position encoding, which has advantages over absolute position encoding. In…

Computation and Language · Computer Science 2021-09-09 Peng Chen

Do traveling waves make good positional encodings?

Transformers rely on positional encoding to compensate for the inherent permutation invariance of self-attention. Traditional approaches use absolute sinusoidal embeddings or learned positional vectors, while more recent methods emphasize…

Machine Learning · Computer Science 2025-11-18 Chase van de Geijn , Ayush Paliwal , Timo Lüddecke , Alexander S. Ecker

Position Encoding with Random Float Sampling Enhances Length Generalization of Transformers

Length generalization is the ability of language models to maintain performance on inputs longer than those seen during pretraining. In this work, we introduce a simple yet powerful position encoding (PE) strategy, Random Float Sampling…

Machine Learning · Computer Science 2026-02-17 Atsushi Shimizu , Shohei Taniguchi , Yutaka Matsuo