English

Structured State-Space Regularization for Generation-Friendly Image Tokenization

Computer Vision and Pattern Recognition 2026-05-20 v2

Abstract

Image tokenizers play a central role in modern generative models, where the structure of the latent space critically determines the downstream generation performance. A key but underexplored property of effective latent representations is spectral organization, the ability to encode information across frequency components. In this work, we introduce structured state-space regularization, a principled approach to inducing spectral structure in latent spaces. We derive a regularization objective by revisiting state-space models (SSMs) as systems mimicking a basis function's behavior. This perspective reveals that hidden states of SSMs are induced to capture the frequency components, resulting in a novel regularizer that enforces the latent space to capture spectral structure of images. Experiments demonstrate that our regularizer improves the generative performance of image tokenizers while incurring only minimal loss in their reconstruction fidelity.

Keywords

Cite

@article{arxiv.2604.11089,
  title  = {Structured State-Space Regularization for Generation-Friendly Image Tokenization},
  author = {Jinsung Lee and Jaemin Oh and Namhun Kim and Dongwon Kim and Byung-Jun Yoon and Suha Kwak},
  journal= {arXiv preprint arXiv:2604.11089},
  year   = {2026}
}

Comments

Related blog posts in https://jinsingsangsung.github.io/collections/blog/ : Towards 2-Dimensional State-Space Models series