Related papers: Wavelet-based Positional Representation for Long C…

On the token distance modeling ability of higher RoPE attention dimension

Length extrapolation algorithms based on Rotary position embedding (RoPE) have shown promising results in extending the context length of language models. However, understanding how position embedding can capture longer-range contextual…

Computation and Language · Computer Science 2024-10-22 Xiangyu Hong , Che Jiang , Biqing Qi , Fandong Meng , Mo Yu , Bowen Zhou , Jie Zhou

VRoPE: Rotary Position Embedding for Video Large Language Models

Rotary Position Embedding (RoPE) has shown strong performance in text-based Large Language Models (LLMs), but extending it to video remains a challenge due to the intricate spatiotemporal structure of video frames. Existing adaptations,…

Artificial Intelligence · Computer Science 2025-11-03 Zikang Liu , Longteng Guo , Yepeng Tang , Tongtian Yue , Junxian Cai , Kai Ma , Qingbin Liu , Xi Chen , Jing Liu

RoPE Distinguishes Neither Positions Nor Tokens in Long Contexts, Provably

We identify intrinsic limitations of Rotary Positional Embeddings (RoPE) in Transformer-based long-context language models. Our theoretical analysis abstracts away from the specific content of the context and depends only on its length. We…

Computation and Language · Computer Science 2026-05-18 Yufeng Du , Phillip Harris , Minyang Tian , Eliu A Huerta , Srikanth Ronanki , Subendhu Rongali , Aram Galstyan , Hao Peng

Beyond Position: the emergence of wavelet-like properties in Transformers

This paper studies how Transformer models with Rotary Position Embeddings (RoPE) develop emergent, wavelet-like properties that compensate for the positional encoding's theoretical limitations. Through an analysis spanning model scales,…

Machine Learning · Computer Science 2025-06-06 Valeria Ruscio , Umberto Nanni , Fabrizio Silvestri

HoPE: Hybrid of Position Embedding for Long Context Vision-Language Models

Vision-Language Models (VLMs) have made significant progress in multimodal tasks. However, their performance often deteriorates in long-context scenarios, particularly long videos. While Rotary Position Embedding (RoPE) has been widely…

Machine Learning · Computer Science 2025-10-09 Haoran Li , Yingjie Qin , Baoyuan Ou , Lai Xu , Ruiwen Xu

HoPE: Hyperbolic Rotary Positional Encoding for Stable Long-Range Dependency Modeling in Large Language Models

Positional encoding mechanisms enable Transformers to model sequential structure and long-range dependencies in text. While absolute positional encodings struggle with extrapolation to longer sequences due to fixed positional…

Computation and Language · Computer Science 2025-09-09 Chang Dai , Hongyu Shan , Mingyang Song , Di Liang

Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective

Enabling LLMs to handle lengthy context is currently a research hotspot. Most LLMs are built upon rotary position embedding (RoPE), a popular position encoding method. Therefore, a prominent path is to extrapolate the RoPE trained on…

Computation and Language · Computer Science 2024-12-13 Meizhi Zhong , Chen Zhang , Yikun Lei , Xikai Liu , Yan Gao , Yao Hu , Kehai Chen , Min Zhang

Extending LLMs' Context Window with 100 Samples

Large Language Models (LLMs) are known to have limited extrapolation ability beyond their pre-trained context window, constraining their application in downstream tasks with lengthy inputs. Recent studies have sought to extend LLMs' context…

Computation and Language · Computer Science 2024-01-17 Yikai Zhang , Junlong Li , Pengfei Liu

The Rotary Position Embedding May Cause Dimension Inefficiency in Attention Heads for Long-Distance Retrieval

The Rotary Position Embedding (RoPE) is widely used in the attention heads of many large language models (LLM). It rotates dimensions in the query and the key vectors by different angles according to their positions in the input sequence.…

Computation and Language · Computer Science 2025-02-18 Ting-Rui Chiang , Dani Yogatama

Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs

Rotary Position Embeddings (RoPE) have become a standard for encoding sequence order in Large Language Models (LLMs) by applying rotations to query and key vectors in the complex plane. Standard implementations, however, utilize only the…

Computation and Language · Computer Science 2025-12-09 Xiaoran Liu , Yuerong Song , Zhigeng Liu , Zengfeng Huang , Qipeng Guo , Zhaoxiang Liu , Shiguo Lian , Ziwei He , Xipeng Qiu

DoPE: Denoising Rotary Position Embedding

Positional encoding is essential for large language models (LLMs) to represent sequence order, yet recent studies show that Rotary Position Embedding (RoPE) can induce massive activation. We investigate the source of these instabilities via…

Computation and Language · Computer Science 2026-01-07 Jing Xiong , Liyang Fan , Hui Shen , Zunhai Su , Min Yang , Lingpeng Kong , Ngai Wong

When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training

Extending context window sizes allows large language models (LLMs) to process longer sequences and handle more complex tasks. Rotary Positional Embedding (RoPE) has become the de facto standard due to its relative positional encoding…

Computation and Language · Computer Science 2024-11-27 Haonan Wang , Qian Liu , Chao Du , Tongyao Zhu , Cunxiao Du , Kenji Kawaguchi , Tianyu Pang

Positional Encoding via Token-Aware Phase Attention

We prove under practical assumptions that Rotary Positional Embedding (RoPE) introduces an intrinsic distance-dependent bias in attention scores that limits RoPE's ability to model long-context. RoPE extension methods may alleviate this…

Computation and Language · Computer Science 2026-05-12 Yu Wang , Sheng Shen , Rémi Munos , Hongyuan Zhan , Yuandong Tian

HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and Extrapolation

Many positional encodings (PEs) are designed to exhibit long-term decay, based on an entrenched and long-standing inductive opinion: tokens farther away from the current position carry less relevant information. We argue that long-term…

Computation and Language · Computer Science 2024-12-06 Yuhan Chen , Ang Lv , Jian Luan , Bin Wang , Wei Liu

PSC: Extending Context Window of Large Language Models via Phase Shift Calibration

Rotary Position Embedding (RoPE) is an efficient position encoding approach and is widely utilized in numerous large language models (LLMs). Recently, a lot of methods have been put forward to further expand the context window based on…

Computation and Language · Computer Science 2025-05-20 Wenqiao Zhu , Chao Xu , Lulu Wang , Jun Wu

Rotary Offset Features in Large Language Models

Transformer-based Large Language Models (LLMs) rely on positional encodings to provide sequence position information to their attention mechanism. Rotary Positional Encodings (RoPE), which encode relative position by rotating queries and…

Computation and Language · Computer Science 2025-08-25 André Jonasson

YaRN: Efficient Context Window Extension of Large Language Models

Rotary Position Embeddings (RoPE) have been shown to effectively encode positional information in transformer-based language models. However, these models fail to generalize past the sequence length they were trained on. We present YaRN…

Computation and Language · Computer Science 2026-02-10 Bowen Peng , Jeffrey Quesnelle , Honglu Fan , Enrico Shippole

Context-aware Rotary Position Embedding

Positional encoding is a vital component of Transformer architectures, enabling models to incorporate sequence order into self-attention mechanisms. Rotary Positional Embeddings (RoPE) have become a widely adopted solution due to their…

Computation and Language · Computer Science 2025-08-01 Ali Veisi , Delaram Fartoot , Hamidreza Amirzadeh

RoPE Attention Can Be Trained in Almost Linear Time

The Rotary Position Embedding (RoPE) mechanism has become a powerful enhancement to the Transformer architecture, which enables models to capture token relationships when encoding positional information. However, the RoPE mechanisms make…

Machine Learning · Computer Science 2026-01-27 Yang Cao , Jiayan Huo , Yingyu Liang , Zhenmei Shi , Zhao Song

LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training

Large language models (LLMs) experience significant performance degradation when the input exceeds the pretraining context window, primarily due to the out-of-distribution (OOD) behavior of Rotary Position Embedding (RoPE). Recent studies…

Computation and Language · Computer Science 2025-08-06 Sikui Zhang , Guangze Gao , Ziyun Gan , Chunfeng Yuan , Zefeng Lin , Houwen Peng , Bing Li , Weiming Hu