English
Related papers

Related papers: Dynamic Position Encoding for Transformers

200 papers

We introduce a new way of learning to encode position information for non-recurrent models, such as Transformer models. Unlike RNN and LSTM, which contain inductive bias by loading the input tokens sequentially, non-recurrent models are…

Machine Learning · Computer Science 2020-03-23 Xuanqing Liu , Hsiang-Fu Yu , Inderjit Dhillon , Cho-Jui Hsieh

This paper introduces a novel approach to position embeddings in transformer models, named "Exact Positional Embeddings" (ExPE). An absolute positional embedding method that can extrapolate to sequences of lengths longer than the ones it…

Computation and Language · Computer Science 2025-10-06 Aleksis Datseris , Sylvia Vassileva , Ivan Koychev , Svetla Boytcheva

Transformer architectures rely on explicit position encodings in order to preserve a notion of word order. In this paper, we argue that existing work does not fully utilize position information. For example, the initial proposal of a…

Computation and Language · Computer Science 2020-09-30 Zhiheng Huang , Davis Liang , Peng Xu , Bing Xiang

Transformer models are permutation equivariant. To supply the order and type information of the input tokens, position and segment embeddings are usually added to the input. Recent works proposed variations of positional encodings with…

Computation and Language · Computer Science 2021-11-04 Pu-Chin Chen , Henry Tsai , Srinadh Bhojanapalli , Hyung Won Chung , Yin-Wen Chang , Chun-Sung Ferng

Since self-attention layers in Transformers are permutation invariant by design, positional encodings must be explicitly incorporated to enable spatial understanding. However, fixed-size lookup tables used in traditional learnable position…

Machine Learning · Computer Science 2025-06-18 Huayang Li , Yahui Liu , Hongyu Sun , Deng Cai , Leyang Cui , Wei Bi , Peilin Zhao , Taro Watanabe

Transformer models, which leverage architectural improvements like self-attention, perform remarkably well on Natural Language Processing (NLP) tasks. The self-attention mechanism is position agnostic. In order to capture positional…

Computation and Language · Computer Science 2021-09-28 Zhiheng Huang , Davis Liang , Peng Xu , Bing Xiang

Without positional information, attention-based Transformer neural networks are permutation-invariant. Absolute or relative positional embeddings are the most popular ways to feed Transformer models with positional information. Absolute…

Machine Learning · Computer Science 2021-11-10 Tatiana Likhomanenko , Qiantong Xu , Gabriel Synnaeve , Ronan Collobert , Alex Rogozhnikov

Position encoding recently has shown effective in the transformer architecture. It enables valuable supervision for dependency modeling between elements at different positions of the sequence. In this paper, we first investigate various…

Computation and Language · Computer Science 2023-11-09 Jianlin Su , Yu Lu , Shengfeng Pan , Ahmed Murtadha , Bo Wen , Yunfeng Liu

Transformers are arguably the main workhorse in recent Natural Language Processing research. By definition a Transformer is invariant with respect to reordering of the input. However, language is inherently sequential and word order is…

Computation and Language · Computer Science 2021-09-10 Philipp Dufter , Martin Schmitt , Hinrich Schütze

In Transformer-based neural machine translation (NMT), the positional encoding mechanism helps the self-attention networks to learn the source representation with order dependency, which makes the Transformer-based NMT achieve…

Computation and Language · Computer Science 2020-04-09 Kehai Chen , Rui Wang , Masao Utiyama , Eiichiro Sumita

In recent years, pre-trained Transformers have dominated the majority of NLP benchmark tasks. Many variants of pre-trained Transformers have kept breaking out, and most focus on designing different pre-training objectives or variants of…

Computation and Language · Computer Science 2020-10-13 Yu-An Wang , Yun-Nung Chen

Recent studies have demonstrated the effectiveness of position encoding in transformer architectures. By incorporating positional information, this approach provides essential guidance for modeling dependencies between elements across…

Machine Learning · Computer Science 2025-08-27 Avinash Amballa

Transformers rely on positional encoding to compensate for the inherent permutation invariance of self-attention. Traditional approaches use absolute sinusoidal embeddings or learned positional vectors, while more recent methods emphasize…

Machine Learning · Computer Science 2025-11-18 Chase van de Geijn , Ayush Paliwal , Timo Lüddecke , Alexander S. Ecker

Transformers rely on both content-based and position-based addressing mechanisms to make predictions, but existing positional encoding techniques often diminish the effectiveness of position-based addressing. Many current methods enforce…

Computation and Language · Computer Science 2025-08-22 Jiajun Zhu , Peihao Wang , Ruisi Cai , Jason D. Lee , Pan Li , Zhangyang Wang

Positional encoding plays a crucial role in transformers, significantly impacting model performance and length generalization. Prior research has introduced absolute positional encoding (APE) and relative positional encoding (RPE) to…

Computation and Language · Computer Science 2024-11-06 Chuanyang Zheng , Yihang Gao , Han Shi , Minbin Huang , Jingyao Li , Jing Xiong , Xiaozhe Ren , Michael Ng , Xin Jiang , Zhenguo Li , Yu Li

Existing positional encoding methods in transformers are fundamentally signal-agnostic, deriving positional information solely from sequence indices while ignoring the underlying signal characteristics. This limitation is particularly…

Machine Learning · Computer Science 2026-05-07 Habib Irani , Vangelis Metsis

Vision transformers have demonstrated significant advantages in computer vision tasks due to their ability to capture long-range dependencies and contextual relationships through self-attention. However, existing position encoding…

Computer Vision and Pattern Recognition · Computer Science 2025-05-15 Xi Chen , Shiyang Zhou , Muqi Huang , Jiaxu Feng , Yun Xiong , Kun Zhou , Biao Yang , Yuhui Zhang , Huishuai Bao , Sijia Peng , Chuan Li , Feng Shi

Adapting Deep Learning (DL) techniques to automate non-trivial coding activities, such as code documentation and defect detection, has been intensively studied recently. Learning to predict code changes is one of the popular and essential…

Software Engineering · Computer Science 2022-08-02 Shiyi Qi , Yaoxian Li , Cuiyun Gao , Xiaohong Su , Shuzheng Gao , Zibin Zheng , Chuanyi Liu

Position encoding (PE), an essential part of self-attention networks (SANs), is used to preserve the word order information for natural language processing tasks, generating fixed position indices for input sequences. However, in…

Computation and Language · Computer Science 2020-11-24 Liang Ding , Longyue Wang , Dacheng Tao

Neural language models process sequences of words, but the mathematical operations inside them are insensitive to the order in which words appear. Positional encodings are the component added to remedy this. Despite their importance,…

Machine Learning · Computer Science 2026-04-08 Giansalvo Cirrincione
‹ Prev 1 2 3 10 Next ›