English
Related papers

Related papers: Exploring Transformer Extrapolation

200 papers

Built upon the Transformer, large language models (LLMs) have captured worldwide attention due to their remarkable abilities. Nevertheless, all Transformer-based models including LLMs suffer from a preset length limit and can hardly…

Computation and Language · Computer Science 2024-10-08 Liang Zhao , Xiachong Feng , Xiaocheng Feng , Weihong Zhong , Dongliang Xu , Qing Yang , Hongtao Liu , Bing Qin , Ting Liu

Enabling LLMs to handle lengthy context is currently a research hotspot. Most LLMs are built upon rotary position embedding (RoPE), a popular position encoding method. Therefore, a prominent path is to extrapolate the RoPE trained on…

Computation and Language · Computer Science 2024-12-13 Meizhi Zhong , Chen Zhang , Yikun Lei , Xikai Liu , Yan Gao , Yao Hu , Kehai Chen , Min Zhang

Transformers often struggle to generalize to longer sequences than those seen during training, a limitation known as length extrapolation. Most existing Relative Positional Encoding (RPE) methods attempt to address this by introducing…

Computation and Language · Computer Science 2025-09-23 Ali Veisi , Hamidreza Amirzadeh , Amir Mansourian

Position modeling plays a critical role in Transformers. In this paper, we focus on length extrapolation, i.e., training on short texts while evaluating longer sequences. We define attention resolution as an indicator of extrapolation. Then…

Computation and Language · Computer Science 2022-12-21 Yutao Sun , Li Dong , Barun Patra , Shuming Ma , Shaohan Huang , Alon Benhaim , Vishrav Chaudhary , Xia Song , Furu Wei

This paper introduces a novel approach to position embeddings in transformer models, named "Exact Positional Embeddings" (ExPE). An absolute positional embedding method that can extrapolate to sequences of lengths longer than the ones it…

Computation and Language · Computer Science 2025-10-06 Aleksis Datseris , Sylvia Vassileva , Ivan Koychev , Svetla Boytcheva

Length generalization, defined as the ability to extrapolate from shorter training sequences to longer test ones, is a significant challenge for language models. This issue persists even with large-scale Transformers handling relatively…

Machine Learning · Computer Science 2024-02-15 Yongchao Zhou , Uri Alon , Xinyun Chen , Xuezhi Wang , Rishabh Agarwal , Denny Zhou

Length generalization is the ability of language models to maintain performance on inputs longer than those seen during pretraining. In this work, we introduce a simple yet powerful position encoding (PE) strategy, Random Float Sampling…

Machine Learning · Computer Science 2026-02-17 Atsushi Shimizu , Shohei Taniguchi , Yutaka Matsuo

Relative Positional Encoding (RPE), which encodes the relative distance between any pair of tokens, is one of the most successful modifications to the original Transformer. As far as we know, theoretical understanding of the RPE-based…

Machine Learning · Computer Science 2022-10-31 Shengjie Luo , Shanda Li , Shuxin Zheng , Tie-Yan Liu , Liwei Wang , Di He

In this study, we investigate the impact of positional encoding (PE) on source separation performance and the generalization ability to long sequences (length extrapolation) in Transformer-based time-frequency (TF) domain dual-path models.…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-03 Kohei Saijo , Tetsuji Ogawa

Length extrapolation permits training a transformer language model on short sequences that preserves perplexities when tested on substantially longer sequences. A relative positional embedding design, ALiBi, has had the widest usage to…

Computation and Language · Computer Science 2023-05-25 Ta-Chung Chi , Ting-Han Fan , Alexander I. Rudnicky , Peter J. Ramadge

Preventing the performance decay of Transformers on inputs longer than those used for training has been an important challenge in extending the context length of these models. Though the Transformer architecture has fundamentally no limits…

The use of Transformer architectures has facilitated remarkable progress in speech enhancement. Training Transformers using substantially long speech utterances is often infeasible as self-attention suffers from quadratic complexity. It is…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-18 Qiquan Zhang , Hongxu Zhu , Xinyuan Qian , Eliathamby Ambikairajah , Haizhou Li

In the realm of large-scale language models, a significant challenge arises when extrapolating sequences beyond the maximum allowable length. This is because the model's position embedding mechanisms are limited to positions encountered…

Computation and Language · Computer Science 2025-02-05 Yui Oka , Taku Hasegawa , Kyosuke Nishida , Kuniko Saito

The extrapolation capability of Large Language Models (LLMs) based on Rotary Position Embedding is currently a topic of considerable interest. The mainstream approach to addressing extrapolation with LLMs involves modifying RoPE by…

Computation and Language · Computer Science 2024-03-14 Xiaoran Liu , Hang Yan , Shuo Zhang , Chenxin An , Xipeng Qiu , Dahua Lin

Positional encoding plays a crucial role in transformers, significantly impacting model performance and length generalization. Prior research has introduced absolute positional encoding (APE) and relative positional encoding (RPE) to…

Computation and Language · Computer Science 2024-11-06 Chuanyang Zheng , Yihang Gao , Han Shi , Minbin Huang , Jingyao Li , Jing Xiong , Xiaozhe Ren , Michael Ng , Xin Jiang , Zhenguo Li , Yu Li

Relative positional embeddings (RPE) have received considerable attention since RPEs effectively model the relative distance among tokens and enable length extrapolation. We propose KERPLE, a framework that generalizes relative position…

Computation and Language · Computer Science 2022-10-14 Ta-Chung Chi , Ting-Han Fan , Peter J. Ramadge , Alexander I. Rudnicky

Transformer-based Large Language Models (LLMs) are pioneering advances in many natural language processing tasks, however, their exceptional capabilities are restricted within the preset context window of Transformer. Position Embedding…

Computation and Language · Computer Science 2024-03-26 Guanzheng Chen , Xin Li , Zaiqiao Meng , Shangsong Liang , Lidong Bing

Length extrapolation algorithms based on Rotary position embedding (RoPE) have shown promising results in extending the context length of language models. However, understanding how position embedding can capture longer-range contextual…

Computation and Language · Computer Science 2024-10-22 Xiangyu Hong , Che Jiang , Biqing Qi , Fandong Meng , Mo Yu , Bowen Zhou , Jie Zhou

Transformer language models have demonstrated impressive generalization capabilities in natural language domains, yet we lack a fine-grained understanding of how such generalization arises. In this paper, we investigate length…

Computation and Language · Computer Science 2025-08-05 Ziyang Cai , Nayoung Lee , Avi Schwarzschild , Samet Oymak , Dimitris Papailiopoulos

This paper addresses the challenge of train-short-test-long (TSTL) scenarios in Large Language Models (LLMs) equipped with Rotary Position Embedding (RoPE), where models pre-trained on shorter sequences face difficulty with…

Computation and Language · Computer Science 2024-09-05 Suyuchen Wang , Ivan Kobyzev , Peng Lu , Mehdi Rezagholizadeh , Bang Liu
‹ Prev 1 2 3 10 Next ›