Related papers: Improving Transformers using Faithful Positional E…

Positional Encoding in Transformer-Based Time Series Models: A Survey

Recent advancements in transformer-based models have greatly improved time series analysis, providing robust solutions for tasks such as forecasting, anomaly detection, and classification. A crucial element of these models is positional…

Machine Learning · Computer Science 2026-05-07 Habib Irani , Vangelis Metsis

Transformer with Tree-order Encoding for Neural Program Generation

While a considerable amount of semantic parsing approaches have employed RNN architectures for code generation tasks, there have been only few attempts to investigate the applicability of Transformers for this task. Including hierarchical…

Computation and Language · Computer Science 2022-06-28 Klaudia-Doris Thellmann , Bernhard Stadler , Ricardo Usbeck , Jens Lehmann

Positional Encoding Helps Recurrent Neural Networks Handle a Large Vocabulary

This study reports an unintuitive finding that positional encoding enhances learning of recurrent neural networks (RNNs). Positional encoding is a high-dimensional representation of time indices on input data. Most famously, positional…

Machine Learning · Computer Science 2024-11-28 Takashi Morita

Alternative positional encoding functions for neural transformers

A key module in neural transformer-based deep architectures is positional encoding. This module enables a suitable way to encode positional information as input for transformer neural layers. This success has been rooted in the use of…

Machine Learning · Computer Science 2025-12-23 Ezequiel Lopez-Rubio , Macoris Decena-Gimenez , Rafael Marcos Luque-Baena

Learning to Encode Position for Transformer with Continuous Dynamical Model

We introduce a new way of learning to encode position information for non-recurrent models, such as Transformer models. Unlike RNN and LSTM, which contain inductive bias by loading the input tokens sequentially, non-recurrent models are…

Machine Learning · Computer Science 2020-03-23 Xuanqing Liu , Hsiang-Fu Yu , Inderjit Dhillon , Cho-Jui Hsieh

Learnable Fourier Features for Multi-Dimensional Spatial Positional Encoding

Attentional mechanisms are order-invariant. Positional encoding is a crucial component to allow attention-based deep model architectures such as Transformer to address sequences or images where the position of information matters. In this…

Machine Learning · Computer Science 2021-11-10 Yang Li , Si Si , Gang Li , Cho-Jui Hsieh , Samy Bengio

Improve Transformer Models with Better Relative Position Embeddings

Transformer architectures rely on explicit position encodings in order to preserve a notion of word order. In this paper, we argue that existing work does not fully utilize position information. For example, the initial proposal of a…

Computation and Language · Computer Science 2020-09-30 Zhiheng Huang , Davis Liang , Peng Xu , Bing Xiang

An Empirical Study on the Impact of Positional Encoding in Transformer-based Monaural Speech Enhancement

Transformer architecture has enabled recent progress in speech enhancement. Since Transformers are position-agostic, positional encoding is the de facto standard component used to enable Transformers to distinguish the order of elements in…

Audio and Speech Processing · Electrical Eng. & Systems 2024-02-15 Qiquan Zhang , Meng Ge , Hongxu Zhu , Eliathamby Ambikairajah , Qi Song , Zhaoheng Ni , Haizhou Li

A Simple and Effective Positional Encoding for Transformers

Transformer models are permutation equivariant. To supply the order and type information of the input tokens, position and segment embeddings are usually added to the input. Recent works proposed variations of positional encodings with…

Computation and Language · Computer Science 2021-11-04 Pu-Chin Chen , Henry Tsai , Srinadh Bhojanapalli , Hyung Won Chung , Yin-Wen Chang , Chun-Sung Ferng

Randomized Positional Encodings Boost Length Generalization of Transformers

Transformers have impressive generalization capabilities on tasks with a fixed context length. However, they fail to generalize to sequences of arbitrary length, even for seemingly simple tasks such as duplicating a string. Moreover, simply…

Machine Learning · Computer Science 2023-05-29 Anian Ruoss , Grégoire Delétang , Tim Genewein , Jordi Grau-Moya , Róbert Csordás , Mehdi Bennani , Shane Legg , Joel Veness

An Augmented Transformer Architecture for Natural Language Generation Tasks

The Transformer based neural networks have been showing significant advantages on most evaluations of various natural language processing and other sequence-to-sequence tasks due to its inherent architecture based superiorities. Although…

Computation and Language · Computer Science 2019-10-31 Hailiang Li , Adele Y. C. Wang , Yang Liu , Du Tang , Zhibin Lei , Wenye Li

Explicit Reordering for Neural Machine Translation

In Transformer-based neural machine translation (NMT), the positional encoding mechanism helps the self-attention networks to learn the source representation with order dependency, which makes the Transformer-based NMT achieve…

Computation and Language · Computer Science 2020-04-09 Kehai Chen , Rui Wang , Masao Utiyama , Eiichiro Sumita

Algebraic Positional Encodings

We introduce a novel positional encoding strategy for Transformer-style models, addressing the shortcomings of existing, often ad hoc, approaches. Our framework provides a flexible mapping from the algebraic specification of a domain to an…

Machine Learning · Computer Science 2024-11-01 Konstantinos Kogkalidis , Jean-Philippe Bernardy , Vikas Garg

Improved Positional Encoding for Implicit Neural Representation based Compact Data Representation

Positional encodings are employed to capture the high frequency information of the encoded signals in implicit neural representation (INR). In this paper, we propose a novel positional encoding method which improves the reconstruction…

Computer Vision and Pattern Recognition · Computer Science 2023-11-13 Bharath Bhushan Damodaran , Francois Schnitzler , Anne Lambert , Pierre Hellier

Position Information in Transformers: An Overview

Transformers are arguably the main workhorse in recent Natural Language Processing research. By definition a Transformer is invariant with respect to reordering of the input. However, language is inherently sequential and word order is…

Computation and Language · Computer Science 2021-09-10 Philipp Dufter , Martin Schmitt , Hinrich Schütze

Towards More Efficient Insertion Transformer with Fractional Positional Encoding

Auto-regressive neural sequence models have been shown to be effective across text generation tasks. However, their left-to-right decoding order prevents generation from being parallelized. Insertion Transformer (Stern et al., 2019) is an…

Computation and Language · Computer Science 2023-02-01 Zhisong Zhang , Yizhe Zhang , Bill Dolan

Transformer Conformal Prediction for Time Series

We present a conformal prediction method for time series using the Transformer architecture to capture long-memory and long-range dependencies. Specifically, we use the Transformer decoder as a conditional quantile estimator to predict the…

Machine Learning · Computer Science 2024-06-11 Junghwan Lee , Chen Xu , Yao Xie

On the Geometry of Positional Encodings in Transformers

Neural language models process sequences of words, but the mathematical operations inside them are insensitive to the order in which words appear. Positional encodings are the component added to remedy this. Despite their importance,…

Machine Learning · Computer Science 2026-04-08 Giansalvo Cirrincione

Contextual Counting: A Mechanistic Study of Transformers on a Quantitative Task

Transformers have revolutionized machine learning across diverse domains, yet understanding their behavior remains crucial, particularly in high-stakes applications. This paper introduces the contextual counting task, a novel toy problem…

Machine Learning · Computer Science 2024-06-06 Siavash Golkar , Alberto Bietti , Mariel Pettee , Michael Eickenberg , Miles Cranmer , Keiya Hirashima , Geraud Krawezik , Nicholas Lourie , Michael McCabe , Rudy Morel , Ruben Ohana , Liam Holden Parker , Bruno Régaldo-Saint Blancard , Kyunghyun Cho , Shirley Ho

Position Information Emerges in Causal Transformers Without Positional Encodings via Similarity of Nearby Embeddings

Transformers with causal attention can solve tasks that require positional information without using positional encodings. In this work, we propose and investigate a new hypothesis about how positional information can be stored without…

Computation and Language · Computer Science 2025-01-03 Chunsheng Zuo , Pavel Guerzhoy , Michael Guerzhoy