English

Masked Diffusion as Self-supervised Representation Learner

Computer Vision and Pattern Recognition 2024-04-16 v4

Abstract

Denoising diffusion probabilistic models have recently demonstrated state-of-the-art generative performance and have been used as strong pixel-level representation learners. This paper decomposes the interrelation between the generative capability and representation learning ability inherent in diffusion models. We present the masked diffusion model (MDM), a scalable self-supervised representation learner for semantic segmentation, substituting the conventional additive Gaussian noise of traditional diffusion with a masking mechanism. Our proposed approach convincingly surpasses prior benchmarks, demonstrating remarkable advancements in both medical and natural image semantic segmentation tasks, particularly in few-shot scenarios.

Keywords

Cite

@article{arxiv.2308.05695,
  title  = {Masked Diffusion as Self-supervised Representation Learner},
  author = {Zixuan Pan and Jianxu Chen and Yiyu Shi},
  journal= {arXiv preprint arXiv:2308.05695},
  year   = {2024}
}