English

SRC-Flow: Compact Semantic Representations Enable Normalizing Flows for Image Generation

Computer Vision and Pattern Recognition 2026-05-26 v2

Abstract

Normalizing flows (NFs) provide exact likelihoods and deterministic invertible sampling, but have historically lagged behind diffusion models for large-scale image generation. We identify a key obstacle: NFs are required to learn a single invertible transport over the full ambient space, making them highly sensitive to high-dimensional representations. This leads to a semantic-capacity mismatch in modern visual representation spaces, where semantic information is compact but encoded in overcomplete features. We propose SRC-Flow, which introduces a Semantic Representation Compressor (SRC) to compact high-dimensional RAE features into a low-dimensional semantic space before flow modeling and preserve reconstruction through the frozen RAE decoder. This compact space reduces the modeling burden of NFs and enables effective likelihood-based generation in semantic representation space. We further adopt constant noise regularization tailored to the fixed unconditional bijection learned by flows. On ImageNet 256×256256 \times 256 and 512×512512 \times 512, SRC-Flow achieves state-of-the-art generation quality among normalizing flow methods, with gFID scores of 1.65 and 2.07 under classifier-free guidance, while retaining exact likelihood computation in the compact semantic representation space and deterministic invertible sampling at the flow level. Codes and models will be available at https://github.com/longtaojiang/SRC-Flow.

Keywords

Cite

@article{arxiv.2605.18267,
  title  = {SRC-Flow: Compact Semantic Representations Enable Normalizing Flows for Image Generation},
  author = {Longtao Jiang and Jianmin Bao and Zhendong Wang and Xin Tao and Pengfei Wan and Zhihui Li and Xiaojun Chang},
  journal= {arXiv preprint arXiv:2605.18267},
  year   = {2026}
}