Related papers: Dynamic Position Encoding for Transformers

Learning to Encode Position for Transformer with Continuous Dynamical Model

We introduce a new way of learning to encode position information for non-recurrent models, such as Transformer models. Unlike RNN and LSTM, which contain inductive bias by loading the input tokens sequentially, non-recurrent models are…

Machine Learning · Computer Science 2020-03-23 Xuanqing Liu , Hsiang-Fu Yu , Inderjit Dhillon , Cho-Jui Hsieh

ExPe: Exact Positional Encodings for Generative Transformer Models with Extrapolating Capabilities

This paper introduces a novel approach to position embeddings in transformer models, named "Exact Positional Embeddings" (ExPE). An absolute positional embedding method that can extrapolate to sequences of lengths longer than the ones it…

Computation and Language · Computer Science 2025-10-06 Aleksis Datseris , Sylvia Vassileva , Ivan Koychev , Svetla Boytcheva

Improve Transformer Models with Better Relative Position Embeddings

Transformer architectures rely on explicit position encodings in order to preserve a notion of word order. In this paper, we argue that existing work does not fully utilize position information. For example, the initial proposal of a…

Computation and Language · Computer Science 2020-09-30 Zhiheng Huang , Davis Liang , Peng Xu , Bing Xiang

A Simple and Effective Positional Encoding for Transformers

Transformer models are permutation equivariant. To supply the order and type information of the input tokens, position and segment embeddings are usually added to the input. Recent works proposed variations of positional encodings with…

Computation and Language · Computer Science 2021-11-04 Pu-Chin Chen , Henry Tsai , Srinadh Bhojanapalli , Hyung Won Chung , Yin-Wen Chang , Chun-Sung Ferng

SeqPE: Transformer with Sequential Position Encoding

Since self-attention layers in Transformers are permutation invariant by design, positional encodings must be explicitly incorporated to enable spatial understanding. However, fixed-size lookup tables used in traditional learnable position…

Machine Learning · Computer Science 2025-06-18 Huayang Li , Yahui Liu , Hongyu Sun , Deng Cai , Leyang Cui , Wei Bi , Peilin Zhao , Taro Watanabe

Multiplicative Position-aware Transformer Models for Language Understanding

Transformer models, which leverage architectural improvements like self-attention, perform remarkably well on Natural Language Processing (NLP) tasks. The self-attention mechanism is position agnostic. In order to capture positional…

Computation and Language · Computer Science 2021-09-28 Zhiheng Huang , Davis Liang , Peng Xu , Bing Xiang

CAPE: Encoding Relative Positions with Continuous Augmented Positional Embeddings

Without positional information, attention-based Transformer neural networks are permutation-invariant. Absolute or relative positional embeddings are the most popular ways to feed Transformer models with positional information. Absolute…

Machine Learning · Computer Science 2021-11-10 Tatiana Likhomanenko , Qiantong Xu , Gabriel Synnaeve , Ronan Collobert , Alex Rogozhnikov

RoFormer: Enhanced Transformer with Rotary Position Embedding

Position encoding recently has shown effective in the transformer architecture. It enables valuable supervision for dependency modeling between elements at different positions of the sequence. In this paper, we first investigate various…

Computation and Language · Computer Science 2023-11-09 Jianlin Su , Yu Lu , Shengfeng Pan , Ahmed Murtadha , Bo Wen , Yunfeng Liu

Position Information in Transformers: An Overview

Transformers are arguably the main workhorse in recent Natural Language Processing research. By definition a Transformer is invariant with respect to reordering of the input. However, language is inherently sequential and word order is…

Computation and Language · Computer Science 2021-09-10 Philipp Dufter , Martin Schmitt , Hinrich Schütze

Explicit Reordering for Neural Machine Translation

In Transformer-based neural machine translation (NMT), the positional encoding mechanism helps the self-attention networks to learn the source representation with order dependency, which makes the Transformer-based NMT achieve…

Computation and Language · Computer Science 2020-04-09 Kehai Chen , Rui Wang , Masao Utiyama , Eiichiro Sumita

What Do Position Embeddings Learn? An Empirical Study of Pre-Trained Language Model Positional Encoding

In recent years, pre-trained Transformers have dominated the majority of NLP benchmark tasks. Many variants of pre-trained Transformers have kept breaking out, and most focus on designing different pre-training objectives or variants of…

Computation and Language · Computer Science 2020-10-13 Yu-An Wang , Yun-Nung Chen

CoPE: A Lightweight Complex Positional Encoding

Recent studies have demonstrated the effectiveness of position encoding in transformer architectures. By incorporating positional information, this approach provides essential guidance for modeling dependencies between elements across…

Machine Learning · Computer Science 2025-08-27 Avinash Amballa

Do traveling waves make good positional encodings?

Transformers rely on positional encoding to compensate for the inherent permutation invariance of self-attention. Traditional approaches use absolute sinusoidal embeddings or learned positional vectors, while more recent methods emphasize…

Machine Learning · Computer Science 2025-11-18 Chase van de Geijn , Ayush Paliwal , Timo Lüddecke , Alexander S. Ecker

Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding

Transformers rely on both content-based and position-based addressing mechanisms to make predictions, but existing positional encoding techniques often diminish the effectiveness of position-based addressing. Many current methods enforce…

Computation and Language · Computer Science 2025-08-22 Jiajun Zhu , Peihao Wang , Ruisi Cai , Jason D. Lee , Pan Li , Zhangyang Wang

DAPE: Data-Adaptive Positional Encoding for Length Extrapolation

Positional encoding plays a crucial role in transformers, significantly impacting model performance and length generalization. Prior research has introduced absolute positional encoding (APE) and relative positional encoding (RPE) to…

Computation and Language · Computer Science 2024-11-06 Chuanyang Zheng , Yihang Gao , Han Shi , Minbin Huang , Jingyao Li , Jing Xiong , Xiaozhe Ren , Michael Ng , Xin Jiang , Zhenguo Li , Yu Li

DyWPE: Signal-Aware Dynamic Wavelet Positional Encoding for Time Series Transformers

Existing positional encoding methods in transformers are fundamentally signal-agnostic, deriving positional information solely from sequence indices while ignoring the underlying signal characteristics. This limitation is particularly…

Machine Learning · Computer Science 2026-05-07 Habib Irani , Vangelis Metsis

A 2D Semantic-Aware Position Encoding for Vision Transformers

Vision transformers have demonstrated significant advantages in computer vision tasks due to their ability to capture long-range dependencies and contextual relationships through self-attention. However, existing position encoding…

Computer Vision and Pattern Recognition · Computer Science 2025-05-15 Xi Chen , Shiyang Zhou , Muqi Huang , Jiaxu Feng , Yun Xiong , Kun Zhou , Biao Yang , Yuhui Zhang , Huishuai Bao , Sijia Peng , Chuan Li , Feng Shi

Dynamically Relative Position Encoding-Based Transformer for Automatic Code Edit

Adapting Deep Learning (DL) techniques to automate non-trivial coding activities, such as code documentation and defect detection, has been intensively studied recently. Learning to predict code changes is one of the popular and essential…

Software Engineering · Computer Science 2022-08-02 Shiyi Qi , Yaoxian Li , Cuiyun Gao , Xiaohong Su , Shuzheng Gao , Zibin Zheng , Chuanyi Liu

Self-Attention with Cross-Lingual Position Representation

Position encoding (PE), an essential part of self-attention networks (SANs), is used to preserve the word order information for natural language processing tasks, generating fixed position indices for input sequences. However, in…

Computation and Language · Computer Science 2020-11-24 Liang Ding , Longyue Wang , Dacheng Tao

On the Geometry of Positional Encodings in Transformers

Neural language models process sequences of words, but the mathematical operations inside them are insensitive to the order in which words appear. Positional encodings are the component added to remedy this. Despite their importance,…

Machine Learning · Computer Science 2026-04-08 Giansalvo Cirrincione