English

Dimensionality-Varying Diffusion Process

Machine Learning 2022-11-30 v1 Computer Vision and Pattern Recognition

Abstract

Diffusion models, which learn to reverse a signal destruction process to generate new data, typically require the signal at each step to have the same dimension. We argue that, considering the spatial redundancy in image signals, there is no need to maintain a high dimensionality in the evolution process, especially in the early generation phase. To this end, we make a theoretical generalization of the forward diffusion process via signal decomposition. Concretely, we manage to decompose an image into multiple orthogonal components and control the attenuation of each component when perturbing the image. That way, along with the noise strength increasing, we are able to diminish those inconsequential components and thus use a lower-dimensional signal to represent the source, barely losing information. Such a reformulation allows to vary dimensions in both training and inference of diffusion models. Extensive experiments on a range of datasets suggest that our approach substantially reduces the computational cost and achieves on-par or even better synthesis performance compared to baseline methods. We also show that our strategy facilitates high-resolution image synthesis and improves FID of diffusion model trained on FFHQ at 1024×10241024\times1024 resolution from 52.40 to 10.46. Code and models will be made publicly available.

Keywords

Cite

@article{arxiv.2211.16032,
  title  = {Dimensionality-Varying Diffusion Process},
  author = {Han Zhang and Ruili Feng and Zhantao Yang and Lianghua Huang and Yu Liu and Yifei Zhang and Yujun Shen and Deli Zhao and Jingren Zhou and Fan Cheng},
  journal= {arXiv preprint arXiv:2211.16032},
  year   = {2022}
}