Related papers: Efficient Matrix Implementation for Rotary Positio…

Rotary Position Embedding for Vision Transformer

Rotary Position Embedding (RoPE) performs remarkably on language models, especially for length extrapolation of Transformers. However, the impacts of RoPE on computer vision domains have been underexplored, even though RoPE appears capable…

Computer Vision and Pattern Recognition · Computer Science 2024-07-17 Byeongho Heo , Song Park , Dongyoon Han , Sangdoo Yun

ComRoPE: Scalable and Robust Rotary Position Embedding Parameterized by Trainable Commuting Angle Matrices

The Transformer architecture has revolutionized various regions since it was proposed, and its effectiveness largely depends on the ability to encode positional information. Traditional position encoding methods exhibit significant…

Computer Vision and Pattern Recognition · Computer Science 2025-06-05 Hao Yu , Tangyu Jiang , Shuning Jia , Shannan Yan , Shunning Liu , Haolong Qian , Guanghao Li , Shuting Dong , Huaisong Zhang , Chun Yuan

URoPE: Universal Relative Position Embedding across Geometric Spaces

Relative position embedding has become a standard mechanism for encoding positional information in Transformers. However, existing formulations are typically limited to a fixed geometric space, namely 1D sequences or regular 2D/3D grids,…

Computer Vision and Pattern Recognition · Computer Science 2026-04-22 Yichen Xie , Depu Meng , Chensheng Peng , Yihan Hu , Quentin Herau , Masayoshi Tomizuka , Wei Zhan

Rotary Masked Autoencoders are Versatile Learners

Applying Transformers to irregular time-series typically requires specializations to their baseline architecture, which can result in additional computational overhead and increased method complexity. We present the Rotary Masked…

Machine Learning · Computer Science 2026-05-13 Uros Zivanovic , Serafina Di Gioia , Andre Scaffidi , Martín de los Rios , Gabriella Contardo , Roberto Trotta

Fractional Rotation, Full Potential? Investigating Performance and Convergence of Partial RoPE

Rotary Positional Embedding (RoPE) is a common choice in transformer architectures for encoding relative positional information. Although earlier work has examined omitting RoPE in specific layers, the effect of varying the fraction of…

Machine Learning · Computer Science 2026-03-13 Mohammad Aflah Khan , Krishna P. Gummadi , Manish Gupta , Abhilasha Ravichander

Context-aware Rotary Position Embedding

Positional encoding is a vital component of Transformer architectures, enabling models to incorporate sequence order into self-attention mechanisms. Rotary Positional Embeddings (RoPE) have become a widely adopted solution due to their…

Computation and Language · Computer Science 2025-08-01 Ali Veisi , Delaram Fartoot , Hamidreza Amirzadeh

Rethinking RoPE: A Mathematical Blueprint for N-dimensional Positional Embedding

Rotary Position Embedding (RoPE) is widely adopted in large language models (LLMs) due to its efficient encoding of relative positions with strong extrapolation capabilities. However, while its application in higher-dimensional input…

Machine Learning · Computer Science 2025-07-16 Haiping Liu , Lijing Lin , Jingyuan Sun , Zhegong Shangguan , Mauricio A. Alvarez , Hongpeng Zhou

Spiral RoPE: Rotate Your Rotary Positional Embeddings in the 2D Plane

Rotary Position Embedding (RoPE) is the de facto positional encoding in large language models due to its ability to encode relative positions and support length extrapolation. When adapted to vision transformers, the standard axial…

Computer Vision and Pattern Recognition · Computer Science 2026-02-04 Haoyu Liu , Sucheng Ren , Tingyu Zhu , Peng Wang , Cihang Xie , Alan Yuille , Zeyu Zheng , Feng Wang

Fast RoPE Attention: Combining the Polynomial Method and Fast Fourier Transform

The transformer architecture has been widely applied to many machine learning tasks. A main bottleneck in the time to perform transformer computations is a task called attention computation. [Alman and Song, NeurIPS 2023] have shown that in…

Machine Learning · Computer Science 2025-05-20 Josh Alman , Zhao Song

CRoPE: Efficient Parametrization of Rotary Positional Embedding

Rotary positional embedding has become the state-of-the-art approach to encode position information in transformer-based models. While it is often succinctly expressed in complex linear algebra, we note that the actual implementation of…

Machine Learning · Computer Science 2026-04-02 Beicheng Lou , Zifei Xu , Vivian W. H. Wong

RoFormer: Enhanced Transformer with Rotary Position Embedding

Position encoding recently has shown effective in the transformer architecture. It enables valuable supervision for dependency modeling between elements at different positions of the sequence. In this paper, we first investigate various…

Computation and Language · Computer Science 2023-11-09 Jianlin Su , Yu Lu , Shengfeng Pan , Ahmed Murtadha , Bo Wen , Yunfeng Liu

GeoPE:A Unified Geometric Positional Embedding for Structured Tensors

Standard Vision Transformers flatten 2D images into 1D sequences, disrupting the natural spatial topology. While Rotary Positional Embedding (RoPE) excels in 1D, it inherits this limitation, often treating spatially distant patches (e.g.,…

Computer Vision and Pattern Recognition · Computer Science 2025-12-05 Yupu Yao , Bowen Yang

RoPE Attention Can Be Trained in Almost Linear Time

The Rotary Position Embedding (RoPE) mechanism has become a powerful enhancement to the Transformer architecture, which enables models to capture token relationships when encoding positional information. However, the RoPE mechanisms make…

Machine Learning · Computer Science 2026-01-27 Yang Cao , Jiayan Huo , Yingyu Liang , Zhenmei Shi , Zhao Song

Circle-RoPE: Cone-like Decoupled Rotary Positional Embedding for Large Vision-Language Models

Rotary Position Embedding (RoPE) is widely adopted in large language models, but when applied to vision-language models (VLMs) it couples text and image position indices and can introduce spurious cross-modal relative-position bias. We…

Computer Vision and Pattern Recognition · Computer Science 2026-05-22 Chengcheng Wang , Jianyuan Guo , Hongguang Li , Yuchuan Tian , Ying Nie , Chang Xu , Kai Han

A Circular Argument : Does RoPE need to be Equivariant for Vision?

Rotary Positional Encodings (RoPE) have emerged as a highly effective technique for one-dimensional sequences in Natural Language Processing spurring recent progress towards generalizing RoPE to higher-dimensional data such as images and…

Computer Vision and Pattern Recognition · Computer Science 2025-11-12 Chase van de Geijn , Timo Lüddecke , Polina Turishcheva , Alexander S. Ecker

LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers

Positional embeddings (PE) play a crucial role in Vision Transformers (ViTs) by providing spatial information otherwise lost due to the permutation invariant nature of self attention. While absolute positional embeddings (APE) have shown…

Computer Vision and Pattern Recognition · Computer Science 2026-04-14 Md Abtahi Majeed Chowdhury , Md Rifat Ur Rahman , Akil Ahmad Taki

Circuit Complexity Bounds for RoPE-based Transformer Architecture

Characterizing the express power of the Transformer architecture is critical to understanding its capacity limits and scaling law. Recent works provide the circuit complexity bounds to Transformer-like architecture. On the other hand,…

Machine Learning · Computer Science 2024-12-03 Bo Chen , Xiaoyu Li , Yingyu Liang , Jiangxuan Long , Zhenmei Shi , Zhao Song

Decoupling the "What" and "Where" With Polar Coordinate Positional Embeddings

The attention mechanism in a Transformer architecture matches key to query based on both content -- the what -- and position in a sequence -- the where. We present an analysis indicating that what and where are entangled in the popular RoPE…

Machine Learning · Computer Science 2025-12-24 Anand Gopalakrishnan , Robert Csordás , Jürgen Schmidhuber , Michael C. Mozer

LieRE: Lie Rotational Positional Encodings

Transformer architectures rely on position encodings to model the spatial structure of input data. Rotary Position Encoding (RoPE) is a widely used method in language models that encodes relative positions through fixed, block-diagonal,…

Computer Vision and Pattern Recognition · Computer Science 2025-08-19 Sophie Ostmeier , Brian Axelrod , Maya Varma , Michael E. Moseley , Akshay Chaudhari , Curtis Langlotz

Rotary Positional Embeddings as Phase Modulation: Theoretical Bounds on the RoPE Base for Long-Context Transformers

Rotary positional embeddings (RoPE) are widely used in large language models to encode token positions through multiplicative rotations, yet their behavior at long context lengths remains poorly characterized. In this work, we reinterpret…

Machine Learning · Computer Science 2026-02-12 Feilong Liu