Lizard: An Efficient Linearization Framework for Large Language Models

Chien Van Nguyen; Huy Nguyen; Ruiyi Zhang; Hanieh Deilamsalehy; Puneet Mathur; Viet Dac Lai; Haoliang Wang; Jayakumar Subramanian; Ryan A. Rossi; Trung Bui; Nikos Vlassis; Franck Dernoncourt; Thien Huu Nguyen

Lizard: An Efficient Linearization Framework for Large Language Models

Computation and Language 2026-04-21 v4 Machine Learning

Authors: Chien Van Nguyen , Huy Nguyen , Ruiyi Zhang , Hanieh Deilamsalehy , Puneet Mathur , Viet Dac Lai , Haoliang Wang , Jayakumar Subramanian , Ryan A. Rossi , Trung Bui , Nikos Vlassis , Franck Dernoncourt , Thien Huu Nguyen

View on arXiv ↗ PDF ↗

Abstract

We propose Lizard, a linearization framework that transforms pretrained Transformer-based Large Language Models (LLMs) into subquadratic architectures. Transformers faces severe computational and memory bottlenecks with long sequences due to the quadratic complexity of softmax attention and the growing Key-Value (KV) cache that makes inference memory-bound by context length. Lizard addresses these limitations by introducing a subquadratic attention mechanism that closely approximates softmax attention while preserving model quality. Unlike prior linearization methods constrained by fixed, non-adaptive structures, Lizard augments the architecture with compact, learnable modules that enable adaptive memory control and robust length generalization. Moreover, we introduce a hardwareaware algorithm that solves numerical instability in gated attention to accelerate training. Extensive experiments show that Lizard achieves near-lossless recovery of its teacher model's performance, significantly outperforming previous methods by up to 9.4 - 24.5 points on the 5-shot MMLU benchmark and demonstrating superior associative recall.

Keywords

key-value cache large language model training attention mechanism

Cite

@article{arxiv.2507.09025,
  title  = {Lizard: An Efficient Linearization Framework for Large Language Models},
  author = {Chien Van Nguyen and Huy Nguyen and Ruiyi Zhang and Hanieh Deilamsalehy and Puneet Mathur and Viet Dac Lai and Haoliang Wang and Jayakumar Subramanian and Ryan A. Rossi and Trung Bui and Nikos Vlassis and Franck Dernoncourt and Thien Huu Nguyen},
  journal= {arXiv preprint arXiv:2507.09025},
  year   = {2026}
}

Comments

ACL 2026 (Main)

Lizard: An Efficient Linearization Framework for Large Language Models

Abstract

Keywords

Cite

Comments

Related papers