English

PRISM: Parallel Residual Iterative Sequence Model

Machine Learning 2026-02-13 v2

Abstract

Generative sequence modeling faces a fundamental tension between the expressivity of Transformers and the efficiency of linear sequence models. Existing efficient architectures are theoretically bounded by shallow, single-step linear updates, while powerful iterative methods like Test-Time Training (TTT) break hardware parallelism due to state-dependent gradients. We propose PRISM (Parallel Residual Iterative Sequence Model) to resolve this tension. PRISM introduces a solver-inspired inductive bias that captures key structural properties of multi-step refinement in a parallelizable form. We employ a Write-Forget Decoupling strategy that isolates non-linearity within the injection operator. To bypass the serial dependency of explicit solvers, PRISM utilizes a two-stage proxy architecture: a short-convolution anchors the initial residual using local history energy, while a learned predictor estimates the refinement updates directly from the input. This design distills structural patterns associated with iterative correction into a parallelizable feedforward operator. Theoretically, we prove that this formulation achieves Rank-LL accumulation, structurally expanding the update manifold beyond the single-step Rank-11 bottleneck. Empirically, it achieves comparable performance to explicit optimization methods while achieving 174x higher throughput.

Keywords

Cite

@article{arxiv.2602.10796,
  title  = {PRISM: Parallel Residual Iterative Sequence Model},
  author = {Jie Jiang and Ke Cheng and Xin Xu and Mengyang Pang and Tianhao Lu and Jiaheng Li and Yue Liu and Yuan Wang and Jun Zhang and Huan Yu and Zhouchen Lin},
  journal= {arXiv preprint arXiv:2602.10796},
  year   = {2026}
}

Comments

21 pages, 2 figures

R2 v1 2026-07-01T10:31:47.288Z