Related papers: HiRoPE: Length Extrapolation for Code Models Using…

Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective

Enabling LLMs to handle lengthy context is currently a research hotspot. Most LLMs are built upon rotary position embedding (RoPE), a popular position encoding method. Therefore, a prominent path is to extrapolate the RoPE trained on…

Computation and Language · Computer Science 2024-12-13 Meizhi Zhong , Chen Zhang , Yikun Lei , Xikai Liu , Yan Gao , Yao Hu , Kehai Chen , Min Zhang

An Evaluation of Context Length Extrapolation in Long Code via Positional Embeddings and Efficient Attention

The rapid advancement of large language models (LLMs) has led to a significant increase in automated tools in the software engineering, capable of performing various code-related tasks such as code generation, completion, and translation.…

Software Engineering · Computer Science 2026-02-26 Madhusudan Ghosh , Rishabh Gupta

Base of RoPE Bounds Context Length

Position embedding is a core component of current Large Language Models (LLMs). Rotary position embedding (RoPE), a technique that encodes the position information with a rotation matrix, has been the de facto choice for position embedding…

Computation and Language · Computer Science 2024-05-24 Xin Men , Mingyu Xu , Bingning Wang , Qingyu Zhang , Hongyu Lin , Xianpei Han , Weipeng Chen

Scaling Laws of RoPE-based Extrapolation

The extrapolation capability of Large Language Models (LLMs) based on Rotary Position Embedding is currently a topic of considerable interest. The mainstream approach to addressing extrapolation with LLMs involves modifying RoPE by…

Computation and Language · Computer Science 2024-03-14 Xiaoran Liu , Hang Yan , Shuo Zhang , Chenxin An , Xipeng Qiu , Dahua Lin

VRoPE: Rotary Position Embedding for Video Large Language Models

Rotary Position Embedding (RoPE) has shown strong performance in text-based Large Language Models (LLMs), but extending it to video remains a challenge due to the intricate spatiotemporal structure of video frames. Existing adaptations,…

Artificial Intelligence · Computer Science 2025-11-03 Zikang Liu , Longteng Guo , Yepeng Tang , Tongtian Yue , Junxian Cai , Kai Ma , Qingbin Liu , Xi Chen , Jing Liu

HoPE: Hybrid of Position Embedding for Long Context Vision-Language Models

Vision-Language Models (VLMs) have made significant progress in multimodal tasks. However, their performance often deteriorates in long-context scenarios, particularly long videos. While Rotary Position Embedding (RoPE) has been widely…

Machine Learning · Computer Science 2025-10-09 Haoran Li , Yingjie Qin , Baoyuan Ou , Lai Xu , Ruiwen Xu

Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models

Recently, Large language models (LLMs) have revolutionized Natural Language Processing (NLP). Pretrained LLMs, due to limited training context size, struggle with handling long token sequences, limiting their performance on various…

Computation and Language · Computer Science 2024-12-11 Haoran Lian , Junmin Chen , Wei Huang , Yizhe Xiong , Wenping Hu , Guiguang Ding , Hui Chen , Jianwei Niu , Zijia Lin , Fuzheng Zhang , Di Zhang

Extending LLMs' Context Window with 100 Samples

Large Language Models (LLMs) are known to have limited extrapolation ability beyond their pre-trained context window, constraining their application in downstream tasks with lengthy inputs. Recent studies have sought to extend LLMs' context…

Computation and Language · Computer Science 2024-01-17 Yikai Zhang , Junlong Li , Pengfei Liu

LongEmbed: Extending Embedding Models for Long Context Retrieval

Embedding models play a pivot role in modern NLP applications such as IR and RAG. While the context limit of LLMs has been pushed beyond 1 million tokens, embedding models are still confined to a narrow context window not exceeding 8k…

Computation and Language · Computer Science 2024-11-08 Dawei Zhu , Liang Wang , Nan Yang , Yifan Song , Wenhao Wu , Furu Wei , Sujian Li

Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs

Rotary Position Embeddings (RoPE) have become a standard for encoding sequence order in Large Language Models (LLMs) by applying rotations to query and key vectors in the complex plane. Standard implementations, however, utilize only the…

Computation and Language · Computer Science 2025-12-09 Xiaoran Liu , Yuerong Song , Zhigeng Liu , Zengfeng Huang , Qipeng Guo , Zhaoxiang Liu , Shiguo Lian , Ziwei He , Xipeng Qiu

LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training

Large language models (LLMs) experience significant performance degradation when the input exceeds the pretraining context window, primarily due to the out-of-distribution (OOD) behavior of Rotary Position Embedding (RoPE). Recent studies…

Computation and Language · Computer Science 2025-08-06 Sikui Zhang , Guangze Gao , Ziyun Gan , Chunfeng Yuan , Zefeng Lin , Houwen Peng , Bing Li , Weiming Hu

Wavelet-based Positional Representation for Long Context

In the realm of large-scale language models, a significant challenge arises when extrapolating sequences beyond the maximum allowable length. This is because the model's position embedding mechanisms are limited to positions encountered…

Computation and Language · Computer Science 2025-02-05 Yui Oka , Taku Hasegawa , Kyosuke Nishida , Kuniko Saito

HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and Extrapolation

Many positional encodings (PEs) are designed to exhibit long-term decay, based on an entrenched and long-standing inductive opinion: tokens farther away from the current position carry less relevant information. We argue that long-term…

Computation and Language · Computer Science 2024-12-06 Yuhan Chen , Ang Lv , Jian Luan , Bin Wang , Wei Liu

CoPE: Clipped RoPE as A Scalable Free Lunch for Long Context LLMs

Rotary Positional Embedding (RoPE) is a key component of context scaling in Large Language Models (LLMs). While various methods have been proposed to adapt RoPE to longer contexts, their guiding principles generally fall into two…

Computation and Language · Computer Science 2026-02-06 Haoran Li , Sucheng Ren , Alan Yuille , Feng Wang

Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation

Large Language Models (LLMs) often struggle to process and generate coherent context when the number of input tokens exceeds the pre-trained length. Recent advancements in long-context extension have significantly expanded the context…

Computation and Language · Computer Science 2025-04-29 Yi Lu , Wanxu Zhao , Xin Zhou , Chenxin An , Chenglong Wang , Shuo Li , Yuming Yang , Jun Zhao , Tao Ji , Tao Gui , Qi Zhang , Xuanjing Huang

Periodic RoPE for Infinite Context LLMs

The ability to process ultra-long contexts is crucial for large language models (LLMs) to perform long-horizon tasks. While recent efforts have extended context windows to 1M and beyond, model performance degrades when sequence length…

Computation and Language · Computer Science 2026-05-28 Simin Huo

Layer-Specific Scaling of Positional Encodings for Superior Long-Context Modeling

Although large language models (LLMs) have achieved significant progress in handling long-context inputs, they still suffer from the ``lost-in-the-middle'' problem, where crucial information in the middle of the context is often…

Computation and Language · Computer Science 2025-03-07 Zhenghua Wang , Yiran Ding , Changze Lv , Zhibo Xu , Tianlong Li , Tianyuan Shi , Xiaoqing Zheng , Xuanjing Huang

The Rotary Position Embedding May Cause Dimension Inefficiency in Attention Heads for Long-Distance Retrieval

The Rotary Position Embedding (RoPE) is widely used in the attention heads of many large language models (LLM). It rotates dimensions in the query and the key vectors by different angles according to their positions in the input sequence.…

Computation and Language · Computer Science 2025-02-18 Ting-Rui Chiang , Dani Yogatama

YaRN: Efficient Context Window Extension of Large Language Models

Rotary Position Embeddings (RoPE) have been shown to effectively encode positional information in transformer-based language models. However, these models fail to generalize past the sequence length they were trained on. We present YaRN…

Computation and Language · Computer Science 2026-02-10 Bowen Peng , Jeffrey Quesnelle , Honglu Fan , Enrico Shippole

Revisiting Multimodal Positional Encoding in Vision-Language Models

Multimodal position encoding is essential for vision-language models, yet there has been little systematic investigation into multimodal position encoding. We conduct a comprehensive analysis of multimodal Rotary Positional Embedding (RoPE)…

Computer Vision and Pattern Recognition · Computer Science 2026-04-07 Jie Huang , Xuejing Liu , Sibo Song , Ruibing Hou , Hong Chang , Junyang Lin , Shuai Bai