Computation and Language · Computer Science
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Ofir Press, Noah A. Smith, Mike Lewis
2022-04-26
Computation and Language · Computer Science
Extending Context Window of Large Language Models via Positional Interpolation
Shouyuan Chen, Sherman Wong, Liangjian Chen, Yuandong Tian
2023-06-29
Computation and Language · Computer Science
Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective
Meizhi Zhong, Chen Zhang, Yikun Lei, Xikai Liu +4
2024-12-13
Computation and Language · Computer Science
Wavelet-based Positional Representation for Long Context
Yui Oka, Taku Hasegawa, Kyosuke Nishida, Kuniko Saito
2025-02-05
Computation and Language · Computer Science
Scaling Laws of RoPE-based Extrapolation
Xiaoran Liu, Hang Yan, Shuo Zhang, Chenxin An +2
2024-03-14
Computation and Language · Computer Science
A Length-Extrapolatable Transformer
Yutao Sun, Li Dong, Barun Patra, Shuming Ma +5
2022-12-21
Computation and Language · Computer Science
On the token distance modeling ability of higher RoPE attention dimension
Xiangyu Hong, Che Jiang, Biqing Qi, Fandong Meng +3
2024-10-22
Machine Learning · Computer Science
Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs
Xin Ma, Yang Liu, Jingjing Liu, Xiaoxu Ma
2024-10-25
Machine Learning · Computer Science
Model Extrapolation Expedites Alignment
Chujie Zheng, Ziqi Wang, Heng Ji, Minlie Huang +1
2025-06-02
Computation and Language · Computer Science
DoPE: Denoising Rotary Position Embedding
Jing Xiong, Liyang Fan, Hui Shen, Zunhai Su +3
2026-01-07
Computation and Language · Computer Science
Dissecting Transformer Length Extrapolation via the Lens of Receptive Field Analysis
Ta-Chung Chi, Ting-Han Fan, Alexander I. Rudnicky, Peter J. Ramadge
2023-05-25
Computation and Language · Computer Science
ExPe: Exact Positional Encodings for Generative Transformer Models with Extrapolating Capabilities
Aleksis Datseris, Sylvia Vassileva, Ivan Koychev, Svetla Boytcheva
2025-10-06
Computation and Language · Computer Science
Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs
Xiaoran Liu, Yuerong Song, Zhigeng Liu, Zengfeng Huang +5
2025-12-09
Machine Learning · Computer Science
Functional Interpolation for Relative Positions Improves Long Context Transformers
Shanda Li, Chong You, Guru Guruganesh, Joshua Ainslie +6
2024-03-05
Computation and Language · Computer Science
A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation (GALI)
Yan Li, Tianyi Zhang, Zechuan Li, Soyeon Caren Han
2025-06-02
Computation and Language · Computer Science
Length Extrapolation of Transformers: A Survey from the Perspective of Positional Encoding
Liang Zhao, Xiachong Feng, Xiaocheng Feng, Weihong Zhong +5
2024-10-08
Machine Learning · Computer Science
Location Attention for Extrapolation to Longer Sequences
Yann Dubois, Gautier Dagan, Dieuwke Hupkes, Elia Bruni
2020-04-23
Machine Learning · Computer Science
Rethinking RoPE Scaling in Quantized LLM: Theory, Outlier, and Channel-Band Analysis with Weight Rescaling
Ye Qiao, Haocheng Xu, Xiaofan Zhang, Sitao Huang
2025-10-02
Machine Learning · Computer Science
Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation
Zhenyu He, Guhao Feng, Shengjie Luo, Kai Yang +5
2024-06-18
Computer Vision and Pattern Recognition · Computer Science
Enhancing Video Super-Resolution via Implicit Resampling-based Alignment
Kai Xu, Ziwei Yu, Xin Wang, Michael Bi Mi +1
2024-01-19