English

Unsupervised Multiple-Object Tracking with a Dynamical Variational Autoencoder

Machine Learning 2022-02-22 v2 Computer Vision and Pattern Recognition

Abstract

In this paper, we present an unsupervised probabilistic model and associated estimation algorithm for multi-object tracking (MOT) based on a dynamical variational autoencoder (DVAE), called DVAE-UMOT. The DVAE is a latent-variable deep generative model that can be seen as an extension of the variational autoencoder for the modeling of temporal sequences. It is included in DVAE-UMOT to model the objects' dynamics, after being pre-trained on an unlabeled synthetic dataset of single-object trajectories. Then the distributions and parameters of DVAE-UMOT are estimated on each multi-object sequence to track using the principles of variational inference: Definition of an approximate posterior distribution of the latent variables and maximization of the corresponding evidence lower bound of the data likehood function. DVAE-UMOT is shown experimentally to compete well with and even surpass the performance of two state-of-the-art probabilistic MOT models. Code and data are publicly available.

Keywords

Cite

@article{arxiv.2202.09315,
  title  = {Unsupervised Multiple-Object Tracking with a Dynamical Variational Autoencoder},
  author = {Xiaoyu Lin and Laurent Girin and Xavier Alameda-Pineda},
  journal= {arXiv preprint arXiv:2202.09315},
  year   = {2022}
}
R2 v1 2026-06-24T09:44:52.915Z