English
Related papers

Related papers: HiRoPE: Length Extrapolation for Code Models Using…

200 papers

Enabling LLMs to handle lengthy context is currently a research hotspot. Most LLMs are built upon rotary position embedding (RoPE), a popular position encoding method. Therefore, a prominent path is to extrapolate the RoPE trained on…

Computation and Language · Computer Science 2024-12-13 Meizhi Zhong , Chen Zhang , Yikun Lei , Xikai Liu , Yan Gao , Yao Hu , Kehai Chen , Min Zhang

The rapid advancement of large language models (LLMs) has led to a significant increase in automated tools in the software engineering, capable of performing various code-related tasks such as code generation, completion, and translation.…

Software Engineering · Computer Science 2026-02-26 Madhusudan Ghosh , Rishabh Gupta

Position embedding is a core component of current Large Language Models (LLMs). Rotary position embedding (RoPE), a technique that encodes the position information with a rotation matrix, has been the de facto choice for position embedding…

Computation and Language · Computer Science 2024-05-24 Xin Men , Mingyu Xu , Bingning Wang , Qingyu Zhang , Hongyu Lin , Xianpei Han , Weipeng Chen

The extrapolation capability of Large Language Models (LLMs) based on Rotary Position Embedding is currently a topic of considerable interest. The mainstream approach to addressing extrapolation with LLMs involves modifying RoPE by…

Computation and Language · Computer Science 2024-03-14 Xiaoran Liu , Hang Yan , Shuo Zhang , Chenxin An , Xipeng Qiu , Dahua Lin

Rotary Position Embedding (RoPE) has shown strong performance in text-based Large Language Models (LLMs), but extending it to video remains a challenge due to the intricate spatiotemporal structure of video frames. Existing adaptations,…

Artificial Intelligence · Computer Science 2025-11-03 Zikang Liu , Longteng Guo , Yepeng Tang , Tongtian Yue , Junxian Cai , Kai Ma , Qingbin Liu , Xi Chen , Jing Liu

Vision-Language Models (VLMs) have made significant progress in multimodal tasks. However, their performance often deteriorates in long-context scenarios, particularly long videos. While Rotary Position Embedding (RoPE) has been widely…

Machine Learning · Computer Science 2025-10-09 Haoran Li , Yingjie Qin , Baoyuan Ou , Lai Xu , Ruiwen Xu

Recently, Large language models (LLMs) have revolutionized Natural Language Processing (NLP). Pretrained LLMs, due to limited training context size, struggle with handling long token sequences, limiting their performance on various…

Computation and Language · Computer Science 2024-12-11 Haoran Lian , Junmin Chen , Wei Huang , Yizhe Xiong , Wenping Hu , Guiguang Ding , Hui Chen , Jianwei Niu , Zijia Lin , Fuzheng Zhang , Di Zhang

Large Language Models (LLMs) are known to have limited extrapolation ability beyond their pre-trained context window, constraining their application in downstream tasks with lengthy inputs. Recent studies have sought to extend LLMs' context…

Computation and Language · Computer Science 2024-01-17 Yikai Zhang , Junlong Li , Pengfei Liu

Embedding models play a pivot role in modern NLP applications such as IR and RAG. While the context limit of LLMs has been pushed beyond 1 million tokens, embedding models are still confined to a narrow context window not exceeding 8k…

Computation and Language · Computer Science 2024-11-08 Dawei Zhu , Liang Wang , Nan Yang , Yifan Song , Wenhao Wu , Furu Wei , Sujian Li

Rotary Position Embeddings (RoPE) have become a standard for encoding sequence order in Large Language Models (LLMs) by applying rotations to query and key vectors in the complex plane. Standard implementations, however, utilize only the…

Computation and Language · Computer Science 2025-12-09 Xiaoran Liu , Yuerong Song , Zhigeng Liu , Zengfeng Huang , Qipeng Guo , Zhaoxiang Liu , Shiguo Lian , Ziwei He , Xipeng Qiu

Large language models (LLMs) experience significant performance degradation when the input exceeds the pretraining context window, primarily due to the out-of-distribution (OOD) behavior of Rotary Position Embedding (RoPE). Recent studies…

Computation and Language · Computer Science 2025-08-06 Sikui Zhang , Guangze Gao , Ziyun Gan , Chunfeng Yuan , Zefeng Lin , Houwen Peng , Bing Li , Weiming Hu

In the realm of large-scale language models, a significant challenge arises when extrapolating sequences beyond the maximum allowable length. This is because the model's position embedding mechanisms are limited to positions encountered…

Computation and Language · Computer Science 2025-02-05 Yui Oka , Taku Hasegawa , Kyosuke Nishida , Kuniko Saito

Many positional encodings (PEs) are designed to exhibit long-term decay, based on an entrenched and long-standing inductive opinion: tokens farther away from the current position carry less relevant information. We argue that long-term…

Computation and Language · Computer Science 2024-12-06 Yuhan Chen , Ang Lv , Jian Luan , Bin Wang , Wei Liu

Rotary Positional Embedding (RoPE) is a key component of context scaling in Large Language Models (LLMs). While various methods have been proposed to adapt RoPE to longer contexts, their guiding principles generally fall into two…

Computation and Language · Computer Science 2026-02-06 Haoran Li , Sucheng Ren , Alan Yuille , Feng Wang

Large Language Models (LLMs) often struggle to process and generate coherent context when the number of input tokens exceeds the pre-trained length. Recent advancements in long-context extension have significantly expanded the context…

Computation and Language · Computer Science 2025-04-29 Yi Lu , Wanxu Zhao , Xin Zhou , Chenxin An , Chenglong Wang , Shuo Li , Yuming Yang , Jun Zhao , Tao Ji , Tao Gui , Qi Zhang , Xuanjing Huang

The ability to process ultra-long contexts is crucial for large language models (LLMs) to perform long-horizon tasks. While recent efforts have extended context windows to 1M and beyond, model performance degrades when sequence length…

Computation and Language · Computer Science 2026-05-28 Simin Huo

Although large language models (LLMs) have achieved significant progress in handling long-context inputs, they still suffer from the ``lost-in-the-middle'' problem, where crucial information in the middle of the context is often…

Computation and Language · Computer Science 2025-03-07 Zhenghua Wang , Yiran Ding , Changze Lv , Zhibo Xu , Tianlong Li , Tianyuan Shi , Xiaoqing Zheng , Xuanjing Huang

The Rotary Position Embedding (RoPE) is widely used in the attention heads of many large language models (LLM). It rotates dimensions in the query and the key vectors by different angles according to their positions in the input sequence.…

Computation and Language · Computer Science 2025-02-18 Ting-Rui Chiang , Dani Yogatama

Rotary Position Embeddings (RoPE) have been shown to effectively encode positional information in transformer-based language models. However, these models fail to generalize past the sequence length they were trained on. We present YaRN…

Computation and Language · Computer Science 2026-02-10 Bowen Peng , Jeffrey Quesnelle , Honglu Fan , Enrico Shippole

Multimodal position encoding is essential for vision-language models, yet there has been little systematic investigation into multimodal position encoding. We conduct a comprehensive analysis of multimodal Rotary Positional Embedding (RoPE)…

Computer Vision and Pattern Recognition · Computer Science 2026-04-07 Jie Huang , Xuejing Liu , Sibo Song , Ruibing Hou , Hong Chang , Junyang Lin , Shuai Bai
‹ Prev 1 2 3 10 Next ›