English

Bootstrapped Representation Learning for Skeleton-Based Action Recognition

Computer Vision and Pattern Recognition 2022-04-20 v2

Abstract

In this work, we study self-supervised representation learning for 3D skeleton-based action recognition. We extend Bootstrap Your Own Latent (BYOL) for representation learning on skeleton sequence data and propose a new data augmentation strategy including two asymmetric transformation pipelines. We also introduce a multi-viewpoint sampling method that leverages multiple viewing angles of the same action captured by different cameras. In the semi-supervised setting, we show that the performance can be further improved by knowledge distillation from wider networks, leveraging once more the unlabeled samples. We conduct extensive experiments on the NTU-60 and NTU-120 datasets to demonstrate the performance of our proposed method. Our method consistently outperforms the current state of the art on both linear evaluation and semi-supervised benchmarks.

Keywords

Cite

@article{arxiv.2202.02232,
  title  = {Bootstrapped Representation Learning for Skeleton-Based Action Recognition},
  author = {Olivier Moliner and Sangxia Huang and Kalle Åström},
  journal= {arXiv preprint arXiv:2202.02232},
  year   = {2022}
}

Comments

Accepted: 2022 IEEE CVPR Workshop on Learning with Limited Labelled Data for Image and Video Understanding (L3D-IVU)

R2 v1 2026-06-24T09:20:21.787Z