English

Group Representational Position Encoding

Machine Learning 2026-05-15 v6 Artificial Intelligence Computation and Language

Abstract

We present GRAPE (Group Representational Position Encoding), a unified framework for positional encoding based on group actions. GRAPE unifies two families of mechanisms: (i) multiplicative rotations (Multiplicative GRAPE) in SO(d)\operatorname{SO}(d) and (ii) additive logit biases (Additive GRAPE) arising from unipotent actions in the general linear group GL\mathrm{GL}. In Multiplicative GRAPE, a position nZn \in \mathbb{Z} (or tRt \in \mathbb{R}) acts as G(n)=exp(nωL)\mathbf{G}(n) = \exp(n \, \omega \, \mathbf{L}) with a rank-2 skew-symmetric generator LRd×d\mathbf{L} \in \mathbb{R}^{d \times d}, yielding a relative, compositional, norm-preserving map with a closed-form matrix exponential. RoPE is recovered exactly when the d/2d/2 planes correspond to canonical coordinate pairs with a log-uniform spectrum. Learned commuting subspaces and compact non-commuting mixtures strictly extend this geometry to capture cross-subspace feature coupling at O(d)O(d) and O(rd)O(r d) cost per head, respectively. In Additive GRAPE, additive logits arise from rank-1 (or low-rank) unipotent actions, recovering ALiBi and the Forgetting Transformer (FoX) as exact special cases while preserving an exact relative law and streaming cacheability. Overall, GRAPE provides a principled design space for positional geometry in long-context models, subsuming RoPE and ALiBi as special cases. Project page: https://github.com/model-architectures/GRAPE.

Keywords

Cite

@article{arxiv.2512.07805,
  title  = {Group Representational Position Encoding},
  author = {Yifan Zhang and Zixiang Chen and Yifeng Liu and Zhen Qin and Huizhuo Yuan and Kangping Xu and Yang Yuan and Quanquan Gu and Andrew Chi-Chih Yao},
  journal= {arXiv preprint arXiv:2512.07805},
  year   = {2026}
}

Comments

Published in ICLR 2026. Project Page: https://github.com/model-architectures/GRAPE

R2 v1 2026-07-01T08:15:21.269Z