Related papers: Skeleton2vec: A Self-supervised Learning Framework…

Self-Supervised 3D Action Representation Learning with Skeleton Cloud Colorization

3D Skeleton-based human action recognition has attracted increasing attention in recent years. Most of the existing work focuses on supervised learning which requires a large number of labeled action sequences that are often expensive and…

Computer Vision and Pattern Recognition · Computer Science 2023-10-17 Siyuan Yang , Jun Liu , Shijian Lu , Er Meng Hwa , Yongjian Hu , Alex C. Kot

SkeletonMAE: Spatial-Temporal Masked Autoencoders for Self-supervised Skeleton Action Recognition

Fully supervised skeleton-based action recognition has achieved great progress with the blooming of deep learning techniques. However, these methods require sufficient labeled data which is not easy to obtain. In contrast, self-supervised…

Computer Vision and Pattern Recognition · Computer Science 2023-05-12 Wenhan Wu , Yilei Hua , Ce Zheng , Shiqian Wu , Chen Chen , Aidong Lu

Skeleton-Snippet Contrastive Learning with Multiscale Feature Fusion for Action Localization

The self-supervised pretraining paradigm has achieved great success in learning 3D action representations for skeleton-based action recognition using contrastive learning. However, learning effective representations for skeleton-based…

Computer Vision and Pattern Recognition · Computer Science 2026-05-06 Qiushuo Cheng , Jingjing Liu , Catherine Morgan , Alan Whone , Majid Mirmehdi

Self-Supervised Skeleton-Based Action Representation Learning: A Benchmark and Beyond

Self-supervised learning (SSL), which aims to learn meaningful prior representations from unlabeled data, has been proven effective for skeleton-based action understanding. Different from the image domain, skeleton data possesses sparser…

Computer Vision and Pattern Recognition · Computer Science 2025-12-29 Jiahang Zhang , Lilang Lin , Shuai Yang , Jiaying Liu

Point2Vec for Self-Supervised Representation Learning on Point Clouds

Recently, the self-supervised learning framework data2vec has shown inspiring performance for various modalities using a masked student-teacher approach. However, it remains open whether such a framework generalizes to the unique challenges…

Computer Vision and Pattern Recognition · Computer Science 2026-01-23 Karim Knaebel , Jonas Schult , Alexander Hermans , Bastian Leibe

Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language

Current self-supervised learning algorithms are often modality-specific and require large amounts of computational resources. To address these issues, we increase the training efficiency of data2vec, a learning objective that generalizes…

Machine Learning · Computer Science 2023-06-16 Alexei Baevski , Arun Babu , Wei-Ning Hsu , Michael Auli

Vision-Language Meets the Skeleton: Progressively Distillation with Cross-Modal Knowledge for 3D Action Representation Learning

Skeleton-based action representation learning aims to interpret and understand human behaviors by encoding the skeleton sequences, which can be categorized into two primary training paradigms: supervised learning and self-supervised…

Computer Vision and Pattern Recognition · Computer Science 2024-09-17 Yang Chen , Tian He , Junfeng Fu , Ling Wang , Jingcai Guo , Ting Hu , Hong Cheng

SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence Pre-training

Skeleton sequence representation learning has shown great advantages for action recognition due to its promising ability to model human joints and topology. However, the current methods usually require sufficient labeled data for training…

Computer Vision and Pattern Recognition · Computer Science 2023-07-18 Hong Yan , Yang Liu , Yushen Wei , Zhen Li , Guanbin Li , Liang Lin

Skeleton-Contrastive 3D Action Representation Learning

This paper strives for self-supervised learning of a feature space suitable for skeleton-based action recognition. Our proposal is built upon learning invariances to input skeleton representations and various skeleton augmentations via a…

Computer Vision and Pattern Recognition · Computer Science 2021-08-10 Fida Mohammad Thoker , Hazel Doughty , Cees G. M. Snoek

Hierarchically Self-Supervised Transformer for Human Skeleton Representation Learning

Despite the success of fully-supervised human skeleton sequence modeling, utilizing self-supervised pre-training for skeleton sequence representation learning has been an active field because acquiring task-specific skeleton annotations at…

Computer Vision and Pattern Recognition · Computer Science 2023-03-28 Yuxiao Chen , Long Zhao , Jianbo Yuan , Yu Tian , Zhaoyang Xia , Shijie Geng , Ligong Han , Dimitris N. Metaxas

Skeleton-to-Image Encoding: Enabling Skeleton Representation Learning via Vision-Pretrained Models

Recent advances in large-scale pretrained vision models have demonstrated impressive capabilities across a wide range of downstream tasks, including cross-modal and multi-modal scenarios. However, their direct application to 3D human…

Computer Vision and Pattern Recognition · Computer Science 2026-03-09 Siyuan Yang , Jun Liu , Hao Cheng , Chong Wang , Shijian Lu , Hedvig Kjellstrom , Weisi Lin , Alex C. Kot

Technical Report: Masked Skeleton Sequence Modeling for Learning Larval Zebrafish Behavior Latent Embeddings

In this report, we introduce a novel self-supervised learning method for extracting latent embeddings from behaviors of larval zebrafish. Drawing inspiration from Masked Modeling techniquesutilized in image processing with Masked…

Computer Vision and Pattern Recognition · Computer Science 2024-03-26 Lanxin Xu , Shuo Wang

AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations

Self-supervision has shown great potential for audio-visual speech recognition by vastly reducing the amount of labeled data required to build good systems. However, existing methods are either not entirely end-to-end or do not train joint…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-23 Jiachen Lian , Alexei Baevski , Wei-Ning Hsu , Michael Auli

Skeleton Cloud Colorization for Unsupervised 3D Action Representation Learning

Skeleton-based human action recognition has attracted increasing attention in recent years. However, most of the existing works focus on supervised learning which requiring a large number of annotated action sequences that are often…

Computer Vision and Pattern Recognition · Computer Science 2021-08-10 Siyuan Yang , Jun Liu , Shijian Lu , Meng Hwa Er , Alex C. Kot

STARS: Self-supervised Tuning for 3D Action Recognition in Skeleton Sequences

Self-supervised pretraining methods with masked prediction demonstrate remarkable within-dataset performance in skeleton-based action recognition. However, we show that, unlike contrastive learning approaches, they do not produce…

Computer Vision and Pattern Recognition · Computer Science 2025-11-11 Soroush Mehraban , Mohammad Javad Rajabi , Andrea Iaboni , Babak Taati

Informative Sample Selection Model for Skeleton-based Action Recognition with Limited Training Samples

Skeleton-based human action recognition aims to classify human skeletal sequences, which are spatiotemporal representations of actions, into predefined categories. To reduce the reliance on costly annotations of skeletal sequences while…

Computer Vision and Pattern Recognition · Computer Science 2025-10-30 Zhigang Tu , Zhengbo Zhang , Jia Gong , Junsong Yuan , Bo Du

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

While the general idea of self-supervised learning is identical across modalities, the actual algorithms and objectives differ widely because they were developed with a single modality in mind. To get us closer to general self-supervised…

Machine Learning · Computer Science 2022-10-27 Alexei Baevski , Wei-Ning Hsu , Qiantong Xu , Arun Babu , Jiatao Gu , Michael Auli

MS$^2$L: Multi-Task Self-Supervised Learning for Skeleton Based Action Recognition

In this paper, we address self-supervised representation learning from human skeletons for action recognition. Previous methods, which usually learn feature presentations from a single reconstruction task, may come across the overfitting…

Computer Vision and Pattern Recognition · Computer Science 2020-10-15 Lilang Lin , Sijie Song , Wenhan Yan , Jiaying Liu

Position Prediction as an Effective Pretraining Strategy

Transformers have gained increasing popularity in a wide range of applications, including Natural Language Processing (NLP), Computer Vision and Speech Recognition, because of their powerful representational capacity. However, harnessing…

Machine Learning · Computer Science 2022-07-18 Shuangfei Zhai , Navdeep Jaitly , Jason Ramapuram , Dan Busbridge , Tatiana Likhomanenko , Joseph Yitan Cheng , Walter Talbott , Chen Huang , Hanlin Goh , Joshua Susskind

Self-supervised Action Representation Learning from Partial Spatio-Temporal Skeleton Sequences

Self-supervised learning has demonstrated remarkable capability in representation learning for skeleton-based action recognition. Existing methods mainly focus on applying global data augmentation to generate different views of the skeleton…

Computer Vision and Pattern Recognition · Computer Science 2023-10-13 Yujie Zhou , Haodong Duan , Anyi Rao , Bing Su , Jiaqi Wang