Related papers: A Simple and Effective Positional Encoding for Tra…

Deconstructing Positional Information: From Attention Logits to Training Biases

Positional encodings enable Transformers to incorporate sequential information, yet their theoretical understanding remains limited to two properties: distance attenuation and translation invariance. Because natural language lacks purely…

Machine Learning · Computer Science 2026-02-11 Zihan Gu , Ruoyu Chen , Han Zhang , Hua Zhang , Yue Hu

Learning to Encode Position for Transformer with Continuous Dynamical Model

We introduce a new way of learning to encode position information for non-recurrent models, such as Transformer models. Unlike RNN and LSTM, which contain inductive bias by loading the input tokens sequentially, non-recurrent models are…

Machine Learning · Computer Science 2020-03-23 Xuanqing Liu , Hsiang-Fu Yu , Inderjit Dhillon , Cho-Jui Hsieh

Positional Encoding in Transformer-Based Time Series Models: A Survey

Recent advancements in transformer-based models have greatly improved time series analysis, providing robust solutions for tasks such as forecasting, anomaly detection, and classification. A crucial element of these models is positional…

Machine Learning · Computer Science 2026-05-07 Habib Irani , Vangelis Metsis

Improving Transformers using Faithful Positional Encoding

We propose a new positional encoding method for a neural network architecture called the Transformer. Unlike the standard sinusoidal positional encoding, our approach is based on solid mathematical grounds and has a guarantee of not losing…

Machine Learning · Computer Science 2024-05-17 Tsuyoshi Idé , Jokin Labaien , Pin-Yu Chen

Positional Encodings for Light Curve Transformers: Playing with Positions and Attention

We conducted empirical experiments to assess the transferability of a light curve transformer to datasets with different cadences and magnitude distributions using various positional encodings (PEs). We proposed a new approach to…

Instrumentation and Methods for Astrophysics · Physics 2023-08-15 Daniel Moreno-Cartagena , Guillermo Cabrera-Vives , Pavlos Protopapas , Cristobal Donoso-Oliva , Manuel Pérez-Carrasco , Martina Cádiz-Leyton

Dynamic Position Encoding for Transformers

Recurrent models have been dominating the field of neural machine translation (NMT) for the past few years. Transformers \citep{vaswani2017attention}, have radically changed it by proposing a novel architecture that relies on a feed-forward…

Computation and Language · Computer Science 2022-10-25 Joyce Zheng , Mehdi Rezagholizadeh , Peyman Passban

DepthFormer: Multimodal Positional Encodings and Cross-Input Attention for Transformer-Based Segmentation Networks

Most approaches for semantic segmentation use only information from color cameras to parse the scenes, yet recent advancements show that using depth data allows to further improve performances. In this work, we focus on transformer-based…

Computer Vision and Pattern Recognition · Computer Science 2023-03-28 Francesco Barbato , Giulia Rizzoli , Pietro Zanuttigh

PermuteFormer: Efficient Relative Position Encoding for Long Sequences

A recent variation of Transformer, Performer, scales Transformer to longer sequences with a linear attention mechanism. However, it is not compatible with relative position encoding, which has advantages over absolute position encoding. In…

Computation and Language · Computer Science 2021-09-09 Peng Chen

Positional Knowledge is All You Need: Position-induced Transformer (PiT) for Operator Learning

Operator learning for Partial Differential Equations (PDEs) is rapidly emerging as a promising approach for surrogate modeling of intricate systems. Transformers with the self-attention mechanism$\unicode{x2013}$a powerful tool originally…

Machine Learning · Computer Science 2024-05-17 Junfeng Chen , Kailiang Wu

Learnable Fourier Features for Multi-Dimensional Spatial Positional Encoding

Attentional mechanisms are order-invariant. Positional encoding is a crucial component to allow attention-based deep model architectures such as Transformer to address sequences or images where the position of information matters. In this…

Machine Learning · Computer Science 2021-11-10 Yang Li , Si Si , Gang Li , Cho-Jui Hsieh , Samy Bengio

CoPE: A Lightweight Complex Positional Encoding

Recent studies have demonstrated the effectiveness of position encoding in transformer architectures. By incorporating positional information, this approach provides essential guidance for modeling dependencies between elements across…

Machine Learning · Computer Science 2025-08-27 Avinash Amballa

Positional Description Matters for Transformers Arithmetic

Transformers, central to the successes in modern Natural Language Processing, often falter on arithmetic tasks despite their vast capabilities --which paradoxically include remarkable coding abilities. We observe that a crucial challenge is…

Computation and Language · Computer Science 2023-11-28 Ruoqi Shen , Sébastien Bubeck , Ronen Eldan , Yin Tat Lee , Yuanzhi Li , Yi Zhang

Improve Transformer Models with Better Relative Position Embeddings

Transformer architectures rely on explicit position encodings in order to preserve a notion of word order. In this paper, we argue that existing work does not fully utilize position information. For example, the initial proposal of a…

Computation and Language · Computer Science 2020-09-30 Zhiheng Huang , Davis Liang , Peng Xu , Bing Xiang

Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation

Transformer-based models have brought a radical change to neural machine translation. A key feature of the Transformer architecture is the so-called multi-head attention mechanism, which allows the model to focus simultaneously on different…

Computation and Language · Computer Science 2020-10-06 Alessandro Raganato , Yves Scherrer , Jörg Tiedemann

Position Information Emerges in Causal Transformers Without Positional Encodings via Similarity of Nearby Embeddings

Transformers with causal attention can solve tasks that require positional information without using positional encodings. In this work, we propose and investigate a new hypothesis about how positional information can be stored without…

Computation and Language · Computer Science 2025-01-03 Chunsheng Zuo , Pavel Guerzhoy , Michael Guerzhoy

A Length-Extrapolatable Transformer

Position modeling plays a critical role in Transformers. In this paper, we focus on length extrapolation, i.e., training on short texts while evaluating longer sequences. We define attention resolution as an indicator of extrapolation. Then…

Computation and Language · Computer Science 2022-12-21 Yutao Sun , Li Dong , Barun Patra , Shuming Ma , Shaohan Huang , Alon Benhaim , Vishrav Chaudhary , Xia Song , Furu Wei

Track Targets by Dense Spatio-Temporal Position Encoding

In this work, we propose a novel paradigm to encode the position of targets for target tracking in videos using transformers. The proposed paradigm, Dense Spatio-Temporal (DST) position encoding, encodes spatio-temporal position information…

Computer Vision and Pattern Recognition · Computer Science 2022-10-19 Jinkun Cao , Hao Wu , Kris Kitani

Enhancing Transformers Through Conditioned Embedded Tokens

Transformers have transformed modern machine learning, driving breakthroughs in computer vision, natural language processing, and robotics. At the core of their success lies the attention mechanism, which enables the modeling of global…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Hemanth Saratchandran , Simon Lucey

Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding

The attention module, which is a crucial component in Transformer, cannot scale efficiently to long sequences due to its quadratic complexity. Many works focus on approximating the dot-then-exponentiate softmax function in the original…

Machine Learning · Computer Science 2021-11-04 Shengjie Luo , Shanda Li , Tianle Cai , Di He , Dinglan Peng , Shuxin Zheng , Guolin Ke , Liwei Wang , Tie-Yan Liu

Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding

Transformers rely on both content-based and position-based addressing mechanisms to make predictions, but existing positional encoding techniques often diminish the effectiveness of position-based addressing. Many current methods enforce…

Computation and Language · Computer Science 2025-08-22 Jiajun Zhu , Peihao Wang , Ruisi Cai , Jason D. Lee , Pan Li , Zhangyang Wang