English

LoopQ: Quantization for Recursive Transformers

Machine Learning 2026-05-19 v1 Artificial Intelligence

Abstract

Looped language models (LoopLMs) improve parameter efficiency by recursively reusing Transformer blocks, enabling deeper computation under a fixed model size. However, this reuse makes LoopLMs more fragile under post-training quantization (PTQ). We present the first systematic study of quantization in LoopLMs and identify three challenges: distribution shift across roles, state reuse across loop transitions, and recursive error accumulation. To address these challenges, we propose LoopQ, a loop-aware PTQ framework that preserves a shared quantized backbone while introducing lightweight adaptations. LoopQ combines activation scaling, selective transformation, cross-loop state alignment, and trajectory-aware optimization to reduce distributional mismatch within loops and error accumulation across loops. Experiments across seven benchmarks show that, under W4A4 quantization, LoopQ improves average downstream accuracy by 68.8% and reduces average perplexity by 87.7% compared with the strongest static PTQ baseline.

Keywords

Cite

@article{arxiv.2605.16343,
  title  = {LoopQ: Quantization for Recursive Transformers},
  author = {Rui Fang and Hsi-Wen Chen and Ming-Syan Chen},
  journal= {arXiv preprint arXiv:2605.16343},
  year   = {2026}
}