English

Nested AutoRegressive Models

Computer Vision and Pattern Recognition 2025-10-28 v1 Artificial Intelligence

Abstract

AutoRegressive (AR) models have demonstrated competitive performance in image generation, achieving results comparable to those of diffusion models. However, their token-by-token image generation mechanism remains computationally intensive and existing solutions such as VAR often lead to limited sample diversity. In this work, we propose a Nested AutoRegressive~(NestAR) model, which proposes nested AutoRegressive architectures in generating images. NestAR designs multi-scale modules in a hierarchical order. These different scaled modules are constructed in an AR architecture, where one larger-scale module is conditioned on outputs from its previous smaller-scale module. Within each module, NestAR uses another AR structure to generate ``patches'' of tokens. The proposed nested AR architecture reduces the overall complexity from O(n)\mathcal{O}(n) to O(logn)\mathcal{O}(\log n) in generating nn image tokens, as well as increases image diversities. NestAR further incorporates flow matching loss to use continuous tokens, and develops objectives to coordinate these multi-scale modules in model training. NestAR achieves competitive image generation performance while significantly lowering computational cost.

Keywords

Cite

@article{arxiv.2510.23028,
  title  = {Nested AutoRegressive Models},
  author = {Hongyu Wu and Xuhui Fan and Zhangkai Wu and Longbing Cao},
  journal= {arXiv preprint arXiv:2510.23028},
  year   = {2025}
}
R2 v1 2026-07-01T07:07:10.389Z