English

Variational Self-Supervised Learning

Machine Learning 2025-05-02 v3 Computer Vision and Pattern Recognition

Abstract

We present Variational Self-Supervised Learning (VSSL), a novel framework that combines variational inference with self-supervised learning to enable efficient, decoder-free representation learning. Unlike traditional VAEs that rely on input reconstruction via a decoder, VSSL symmetrically couples two encoders with Gaussian outputs. A momentum-updated teacher network defines a dynamic, data-dependent prior, while the student encoder produces an approximate posterior from augmented views. The reconstruction term in the ELBO is replaced with a cross-view denoising objective, preserving the analytical tractability of Gaussian KL divergence. We further introduce cosine-based formulations of KL and log-likelihood terms to enhance semantic alignment in high-dimensional latent spaces. Experiments on CIFAR-10, CIFAR-100, and ImageNet-100 show that VSSL achieves competitive or superior performance to leading self-supervised methods, including BYOL and MoCo V3. VSSL offers a scalable, probabilistically grounded approach to learning transferable representations without generative reconstruction, bridging the gap between variational modeling and modern self-supervised techniques.

Keywords

Cite

@article{arxiv.2504.04318,
  title  = {Variational Self-Supervised Learning},
  author = {Mehmet Can Yavuz and Berrin Yanikoglu},
  journal= {arXiv preprint arXiv:2504.04318},
  year   = {2025}
}
R2 v1 2026-06-28T22:48:19.751Z