Related papers: Learning to Encode Position for Transformer with C…

Dynamic Position Encoding for Transformers

Recurrent models have been dominating the field of neural machine translation (NMT) for the past few years. Transformers \citep{vaswani2017attention}, have radically changed it by proposing a novel architecture that relies on a feed-forward…

Computation and Language · Computer Science 2022-10-25 Joyce Zheng , Mehdi Rezagholizadeh , Peyman Passban

Improve Transformer Models with Better Relative Position Embeddings

Transformer architectures rely on explicit position encodings in order to preserve a notion of word order. In this paper, we argue that existing work does not fully utilize position information. For example, the initial proposal of a…

Computation and Language · Computer Science 2020-09-30 Zhiheng Huang , Davis Liang , Peng Xu , Bing Xiang

Positional Encoding Helps Recurrent Neural Networks Handle a Large Vocabulary

This study reports an unintuitive finding that positional encoding enhances learning of recurrent neural networks (RNNs). Positional encoding is a high-dimensional representation of time indices on input data. Most famously, positional…

Machine Learning · Computer Science 2024-11-28 Takashi Morita

Improving Transformers using Faithful Positional Encoding

We propose a new positional encoding method for a neural network architecture called the Transformer. Unlike the standard sinusoidal positional encoding, our approach is based on solid mathematical grounds and has a guarantee of not losing…

Machine Learning · Computer Science 2024-05-17 Tsuyoshi Idé , Jokin Labaien , Pin-Yu Chen

A Simple and Effective Positional Encoding for Transformers

Transformer models are permutation equivariant. To supply the order and type information of the input tokens, position and segment embeddings are usually added to the input. Recent works proposed variations of positional encodings with…

Computation and Language · Computer Science 2021-11-04 Pu-Chin Chen , Henry Tsai , Srinadh Bhojanapalli , Hyung Won Chung , Yin-Wen Chang , Chun-Sung Ferng

Learnable Fourier Features for Multi-Dimensional Spatial Positional Encoding

Attentional mechanisms are order-invariant. Positional encoding is a crucial component to allow attention-based deep model architectures such as Transformer to address sequences or images where the position of information matters. In this…

Machine Learning · Computer Science 2021-11-10 Yang Li , Si Si , Gang Li , Cho-Jui Hsieh , Samy Bengio

Alternative positional encoding functions for neural transformers

A key module in neural transformer-based deep architectures is positional encoding. This module enables a suitable way to encode positional information as input for transformer neural layers. This success has been rooted in the use of…

Machine Learning · Computer Science 2025-12-23 Ezequiel Lopez-Rubio , Macoris Decena-Gimenez , Rafael Marcos Luque-Baena

Positional Encoding in Transformer-Based Time Series Models: A Survey

Recent advancements in transformer-based models have greatly improved time series analysis, providing robust solutions for tasks such as forecasting, anomaly detection, and classification. A crucial element of these models is positional…

Machine Learning · Computer Science 2026-05-07 Habib Irani , Vangelis Metsis

Deconstructing Positional Information: From Attention Logits to Training Biases

Positional encodings enable Transformers to incorporate sequential information, yet their theoretical understanding remains limited to two properties: distance attenuation and translation invariance. Because natural language lacks purely…

Machine Learning · Computer Science 2026-02-11 Zihan Gu , Ruoyu Chen , Han Zhang , Hua Zhang , Yue Hu

Language Modeling with Deep Transformers

We explore deep autoregressive Transformer models in language modeling for speech recognition. We focus on two aspects. First, we revisit Transformer model configurations specifically for language modeling. We show that well configured…

Computation and Language · Computer Science 2019-09-25 Kazuki Irie , Albert Zeyer , Ralf Schlüter , Hermann Ney

On the Geometry of Positional Encodings in Transformers

Neural language models process sequences of words, but the mathematical operations inside them are insensitive to the order in which words appear. Positional encodings are the component added to remedy this. Despite their importance,…

Machine Learning · Computer Science 2026-04-08 Giansalvo Cirrincione

SeqPE: Transformer with Sequential Position Encoding

Since self-attention layers in Transformers are permutation invariant by design, positional encodings must be explicitly incorporated to enable spatial understanding. However, fixed-size lookup tables used in traditional learnable position…

Machine Learning · Computer Science 2025-06-18 Huayang Li , Yahui Liu , Hongyu Sun , Deng Cai , Leyang Cui , Wei Bi , Peilin Zhao , Taro Watanabe

Positional Encoding to Control Output Sequence Length

Neural encoder-decoder models have been successful in natural language generation tasks. However, real applications of abstractive summarization must consider additional constraint that a generated summary should not exceed a desired…

Computation and Language · Computer Science 2019-04-17 Sho Takase , Naoaki Okazaki

Randomized Positional Encodings Boost Length Generalization of Transformers

Transformers have impressive generalization capabilities on tasks with a fixed context length. However, they fail to generalize to sequences of arbitrary length, even for seemingly simple tasks such as duplicating a string. Moreover, simply…

Machine Learning · Computer Science 2023-05-29 Anian Ruoss , Grégoire Delétang , Tim Genewein , Jordi Grau-Moya , Róbert Csordás , Mehdi Bennani , Shane Legg , Joel Veness

What Do Position Embeddings Learn? An Empirical Study of Pre-Trained Language Model Positional Encoding

In recent years, pre-trained Transformers have dominated the majority of NLP benchmark tasks. Many variants of pre-trained Transformers have kept breaking out, and most focus on designing different pre-training objectives or variants of…

Computation and Language · Computer Science 2020-10-13 Yu-An Wang , Yun-Nung Chen

The Impact of Positional Encodings on Multilingual Compression

In order to preserve word-order information in a non-autoregressive setting, transformer architectures tend to include positional knowledge, by (for instance) adding positional encodings to token embeddings. Several modifications have been…

Computation and Language · Computer Science 2021-09-14 Vinit Ravishankar , Anders Søgaard

The Recurrent Transformer: Greater Effective Depth and Efficient Decoding

Transformers process tokens in parallel but are temporally shallow: at position $t$, each layer attends to key-value pairs computed based on the previous layer, yielding a depth capped by the number of layers. Recurrent models offer…

Machine Learning · Computer Science 2026-04-24 Costin-Andrei Oncescu , Depen Morwani , Samy Jelassi , Alexandru Meterez , Mujin Kwun , Sham Kakade

Theoretical Analysis of Positional Encodings in Transformer Models: Impact on Expressiveness and Generalization

Positional encodings are a core part of transformer-based models, enabling processing of sequential data without recurrence. This paper presents a theoretical framework to analyze how various positional encoding methods, including…

Machine Learning · Computer Science 2025-06-10 Yin Li

A Local Information Criterion for Dynamical Systems

Encoding a sequence of observations is an essential task with many applications. The encoding can become highly efficient when the observations are generated by a dynamical system. A dynamical system imposes regularities on the observations…

Machine Learning · Statistics 2018-05-29 Arash Mehrjou , Friedrich Solowjow , Sebastian Trimpe , Bernhard Schölkopf

Learning Regularized Positional Encoding for Molecular Prediction

Machine learning has become a promising approach for molecular modeling. Positional quantities, such as interatomic distances and bond angles, play a crucial role in molecule physics. The existing works rely on careful manual design of…

Machine Learning · Computer Science 2022-11-24 Xiang Gao , Weihao Gao , Wenzhi Xiao , Zhirui Wang , Chong Wang , Liang Xiang