English

FINE: Factorizing Knowledge for Initialization of Variable-sized Diffusion Models

Computer Vision and Pattern Recognition 2026-03-05 v2

Abstract

The training of diffusion models is computationally intensive, making effective pre-training essential. However, real-world deployments often demand models of variable sizes due to diverse memory and computational constraints, posing challenges when corresponding pre-trained versions are unavailable. To address this, we propose FINE, a novel pre-training method whose resulting model can flexibly factorize its knowledge into fundamental components, termed learngenes, enabling direct initialization of models of various sizes and eliminating the need for repeated pre-training. Rather than optimizing a conventional full-parameter model, FINE represents each layer's weights as the product of UU_{\star}, Σ(l)\Sigma_{\star}^{(l)}, and VV_{\star}^\top, where UU_{\star} and VV_{\star} serve as size-agnostic learngenes shared across layers, while Σ(l)\Sigma_{\star}^{(l)} remains layer-specific. By jointly training these components, FINE forms a decomposable and transferable knowledge structure that allows efficient initialization through flexible recombination of learngenes, requiring only light retraining of Σ(l)\Sigma_{\star}^{(l)} on limited data. Extensive experiments demonstrate the efficiency of FINE, achieving state-of-the-art performance in initializing variable-sized models across diverse resource-constrained deployments. Furthermore, models initialized by FINE effectively adapt to diverse tasks, showcasing the task-agnostic versatility of learngenes.

Keywords

Cite

@article{arxiv.2409.19289,
  title  = {FINE: Factorizing Knowledge for Initialization of Variable-sized Diffusion Models},
  author = {Yucheng Xie and Fu Feng and Ruixiao Shi and Jianlu Shen and Jing Wang and Yong Rui and Xin Geng},
  journal= {arXiv preprint arXiv:2409.19289},
  year   = {2026}
}
R2 v1 2026-06-28T19:00:26.451Z