English
Related papers

Related papers: MEP: Multiple Kernel Learning Enhancing Relative P…

200 papers

Since the introduction of the transformer model by Vaswani et al. (2017), a fundamental question has yet to be answered: how does a model achieve extrapolation at inference time for sequences that are longer than it saw during training? We…

Computation and Language · Computer Science 2022-04-26 Ofir Press , Noah A. Smith , Mike Lewis

In Transformer-based architectures, the attention mechanism is inherently permutation-invariant with respect to the input sequence's tokens. To impose sequential order, token positions are typically encoded using a scheme with either fixed…

Machine Learning · Computer Science 2023-10-31 Giorgio Angelotti

Transformer-based language models rely on positional encoding (PE) to handle token order and support context length extrapolation. However, existing PE methods lack theoretical clarity and rely on limited evaluation metrics to substantiate…

Computation and Language · Computer Science 2026-05-11 Arthur S. Bianchessi , Yasmin C. Aguirre , Rodrigo C. Barros , Lucas S. Kupssinskü

Modern deep neural networks achieved remarkable progress in medical image segmentation tasks. However, it has recently been observed that they tend to produce overconfident estimates, even in situations of high uncertainty, leading to…

Computer Vision and Pattern Recognition · Computer Science 2023-06-05 Agostina Larrazabal , Cesar Martinez , Jose Dolz , Enzo Ferrante

Large language models (LLMs) have revolutionized natural language processing, but their ability to process long sequences is fundamentally limited by the context window size during training. Existing length extrapolation methods often…

Artificial Intelligence · Computer Science 2026-01-13 Nitin Vetcha

Transformers often struggle to generalize to longer sequences than those seen during training, a limitation known as length extrapolation. Most existing Relative Positional Encoding (RPE) methods attempt to address this by introducing…

Computation and Language · Computer Science 2025-09-23 Ali Veisi , Hamidreza Amirzadeh , Amir Mansourian

Obtaining high-quality labels is costly, whereas unlabeled covariates are often abundant, motivating semi-supervised inference methods with reliable uncertainty quantification. Prediction-powered inference (PPI) leverages a machine-learning…

Machine Learning · Statistics 2026-05-29 Se Yoon Lee , Jae Kwang Kim

A mainstream type of current self-supervised learning methods pursues a general-purpose representation that can be well transferred to downstream tasks, typically by optimizing on a given pretext task such as instance discrimination. In…

Computer Vision and Pattern Recognition · Computer Science 2022-10-21 Xin Liu , Zhongdao Wang , Yali Li , Shengjin Wang

Relative positional embeddings (RPE) have received considerable attention since RPEs effectively model the relative distance among tokens and enable length extrapolation. We propose KERPLE, a framework that generalizes relative position…

Computation and Language · Computer Science 2022-10-14 Ta-Chung Chi , Ting-Han Fan , Peter J. Ramadge , Alexander I. Rudnicky

Large language models (LLMs), although having revolutionized many fields, still suffer from the challenging extrapolation problem, where the inference ability of LLMs sharply declines beyond their max training lengths. In this work, we…

Machine Learning · Computer Science 2024-10-25 Xin Ma , Yang Liu , Jingjing Liu , Xiaoxu Ma

In the realm of large-scale language models, a significant challenge arises when extrapolating sequences beyond the maximum allowable length. This is because the model's position embedding mechanisms are limited to positions encountered…

Computation and Language · Computer Science 2025-02-05 Yui Oka , Taku Hasegawa , Kyosuke Nishida , Kuniko Saito

Many positional encodings (PEs) are designed to exhibit long-term decay, based on an entrenched and long-standing inductive opinion: tokens farther away from the current position carry less relevant information. We argue that long-term…

Computation and Language · Computer Science 2024-12-06 Yuhan Chen , Ang Lv , Jian Luan , Bin Wang , Wei Liu

Since self-attention layers in Transformers are permutation invariant by design, positional encodings must be explicitly incorporated to enable spatial understanding. However, fixed-size lookup tables used in traditional learnable position…

Machine Learning · Computer Science 2025-06-18 Huayang Li , Yahui Liu , Hongyu Sun , Deng Cai , Leyang Cui , Wei Bi , Peilin Zhao , Taro Watanabe

Referring Expression Comprehension (REC), which aims to ground a local visual region via natural language, is a task that heavily relies on multimodal alignment. Most existing methods utilize powerful pre-trained models to transfer…

Computer Vision and Pattern Recognition · Computer Science 2025-06-23 Ting Liu , Zunnan Xu , Yue Hu , Liangtao Shi , Zhiqiang Wang , Quanjun Yin

Rotary Positional Embeddings (RoPE) have become the standard for Large Language Models (LLMs) due to their ability to encode relative positions through geometric rotation. However, we identify a significant limitation we term ''Spectral…

Computation and Language · Computer Science 2026-02-02 Kanishk Awadhiya

Length extrapolation permits training a transformer language model on short sequences that preserves perplexities when tested on substantially longer sequences. A relative positional embedding design, ALiBi, has had the widest usage to…

Computation and Language · Computer Science 2023-05-25 Ta-Chung Chi , Ting-Han Fan , Alexander I. Rudnicky , Peter J. Ramadge

Positional encodings are a core part of transformer-based models, enabling processing of sequential data without recurrence. This paper presents a theoretical framework to analyze how various positional encoding methods, including…

Machine Learning · Computer Science 2025-06-10 Yin Li

Mixture-of-Experts (MoE) models are typically pre-trained with explicit load-balancing constraints to ensure statistically balanced expert routing. Despite this, we observe that even well-trained MoE models exhibit significantly imbalanced…

Machine Learning · Computer Science 2026-01-27 Xuan-Phi Nguyen , Shrey Pandit , Austin Xu , Caiming Xiong , Shafiq Joty

Large Language Models (LLMs) are discovered to suffer from accurately retrieving key information. To address this, we propose Mask-Enhanced Autoregressive Prediction (MEAP), a simple yet effective training paradigm that seamlessly…

Computation and Language · Computer Science 2026-03-16 Xialie Zhuang , Zhikai Jia , Jianjin Li , Zhenyu Zhang , Li Shen , Zheng Cao , Shiwei Liu

In this work, we leverage the intrinsic segmentation of language sequences and design a new positional encoding method called Bilevel Positional Encoding (BiPE). For each position, our BiPE blends an intra-segment encoding and an…

Machine Learning · Computer Science 2024-06-18 Zhenyu He , Guhao Feng , Shengjie Luo , Kai Yang , Liwei Wang , Jingjing Xu , Zhi Zhang , Hongxia Yang , Di He
‹ Prev 1 2 3 10 Next ›