English
Related papers

Related papers: DAPE: Data-Adaptive Positional Encoding for Length…

200 papers

There are several improvements proposed over the baseline Absolute Positional Encoding (APE) method used in original transformer. In this study, we aim to investigate the implications of inadequately representing positional encoding in…

Computation and Language · Computer Science 2024-05-09 Arpit Aggarwal

Without positional information, attention-based Transformer neural networks are permutation-invariant. Absolute or relative positional embeddings are the most popular ways to feed Transformer models with positional information. Absolute…

Machine Learning · Computer Science 2021-11-10 Tatiana Likhomanenko , Qiantong Xu , Gabriel Synnaeve , Ronan Collobert , Alex Rogozhnikov

Rotary Positional Encoding (RoPE) is widely used in modern large language models. However, when sequences are extended beyond the range seen during training, rotary phases can enter out-of-distribution regimes, leading to spurious…

Machine Learning · Computer Science 2026-05-12 Riccardo Ali , Alessio Borgi , Christopher Irwin , Mario Severino , Pietro Liò

We prove under practical assumptions that Rotary Positional Embedding (RoPE) introduces an intrinsic distance-dependent bias in attention scores that limits RoPE's ability to model long-context. RoPE extension methods may alleviate this…

Computation and Language · Computer Science 2026-05-12 Yu Wang , Sheng Shen , Rémi Munos , Hongyuan Zhan , Yuandong Tian

This paper introduces a novel approach to position embeddings in transformer models, named "Exact Positional Embeddings" (ExPE). An absolute positional embedding method that can extrapolate to sequences of lengths longer than the ones it…

Computation and Language · Computer Science 2025-10-06 Aleksis Datseris , Sylvia Vassileva , Ivan Koychev , Svetla Boytcheva

Recent advances in Transformer models allow for unprecedented sequence lengths, due to linear space and time complexity. In the meantime, relative positional encoding (RPE) was proposed as beneficial for classical Transformers and consists…

Machine Learning · Computer Science 2021-06-11 Antoine Liutkus , Ondřej Cífka , Shih-Lun Wu , Umut Şimşekli , Yi-Hsuan Yang , Gaël Richard

Length generalization, the ability to generalize from small training context sizes to larger ones, is a critical challenge in the development of Transformer-based language models. Positional encoding (PE) has been identified as a major…

Computation and Language · Computer Science 2023-11-08 Amirhossein Kazemnejad , Inkit Padhi , Karthikeyan Natesan Ramamurthy , Payel Das , Siva Reddy

Positional encoding mechanisms enable Transformers to model sequential structure and long-range dependencies in text. While absolute positional encodings struggle with extrapolation to longer sequences due to fixed positional…

Computation and Language · Computer Science 2025-09-09 Chang Dai , Hongyu Shan , Mingyang Song , Di Liang

Recent studies have demonstrated the effectiveness of position encoding in transformer architectures. By incorporating positional information, this approach provides essential guidance for modeling dependencies between elements across…

Machine Learning · Computer Science 2025-08-27 Avinash Amballa

Since self-attention layers in Transformers are permutation invariant by design, positional encodings must be explicitly incorporated to enable spatial understanding. However, fixed-size lookup tables used in traditional learnable position…

Machine Learning · Computer Science 2025-06-18 Huayang Li , Yahui Liu , Hongyu Sun , Deng Cai , Leyang Cui , Wei Bi , Peilin Zhao , Taro Watanabe

In this study, we investigate the impact of positional encoding (PE) on source separation performance and the generalization ability to long sequences (length extrapolation) in Transformer-based time-frequency (TF) domain dual-path models.…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-03 Kohei Saijo , Tetsuji Ogawa

Transformers have demonstrated outstanding performance in many applications of deep learning. When applied to time series data, transformers require effective position encoding to capture the ordering of the time series data. The efficacy…

Machine Learning · Computer Science 2024-02-21 Navid Mohammadi Foumani , Chang Wei Tan , Geoffrey I. Webb , Mahsa Salehi

Relative Positional Encoding (RPE), which encodes the relative distance between any pair of tokens, is one of the most successful modifications to the original Transformer. As far as we know, theoretical understanding of the RPE-based…

Machine Learning · Computer Science 2022-10-31 Shengjie Luo , Shanda Li , Shuxin Zheng , Tie-Yan Liu , Liwei Wang , Di He

Many positional encodings (PEs) are designed to exhibit long-term decay, based on an entrenched and long-standing inductive opinion: tokens farther away from the current position carry less relevant information. We argue that long-term…

Computation and Language · Computer Science 2024-12-06 Yuhan Chen , Ang Lv , Jian Luan , Bin Wang , Wei Liu

Positional encoding is essential for large language models (LLMs) to represent sequence order, yet recent studies show that Rotary Position Embedding (RoPE) can induce massive activation. We investigate the source of these instabilities via…

Computation and Language · Computer Science 2026-01-07 Jing Xiong , Liyang Fan , Hui Shen , Zunhai Su , Min Yang , Lingpeng Kong , Ngai Wong

Vision transformers have demonstrated significant advantages in computer vision tasks due to their ability to capture long-range dependencies and contextual relationships through self-attention. However, existing position encoding…

Computer Vision and Pattern Recognition · Computer Science 2025-05-15 Xi Chen , Shiyang Zhou , Muqi Huang , Jiaxu Feng , Yun Xiong , Kun Zhou , Biao Yang , Yuhui Zhang , Huishuai Bao , Sijia Peng , Chuan Li , Feng Shi

We propose Parabolic Position Encoding (PaPE), a parabola-based position encoding for vision modalities in attention-based architectures. Given a set of vision tokens-such as from videos, event camera streams, images, or point clouds-our…

Relative position encoding (RPE) is important for transformer to capture sequence ordering of input tokens. General efficacy has been proven in natural language processing. However, in computer vision, its efficacy is not well studied and…

Computer Vision and Pattern Recognition · Computer Science 2021-07-30 Kan Wu , Houwen Peng , Minghao Chen , Jianlong Fu , Hongyang Chao

Transformers rely on both content-based and position-based addressing mechanisms to make predictions, but existing positional encoding techniques often diminish the effectiveness of position-based addressing. Many current methods enforce…

Computation and Language · Computer Science 2025-08-22 Jiajun Zhu , Peihao Wang , Ruisi Cai , Jason D. Lee , Pan Li , Zhangyang Wang

Positional encoding is a vital component of Transformer architectures, enabling models to incorporate sequence order into self-attention mechanisms. Rotary Positional Embeddings (RoPE) have become a widely adopted solution due to their…

Computation and Language · Computer Science 2025-08-01 Ali Veisi , Delaram Fartoot , Hamidreza Amirzadeh
‹ Prev 1 2 3 10 Next ›