Related papers: Multi-Modality Co-Learning for Efficient Skeleton-…

Skeleton-based Action Recognition via Adaptive Cross-Form Learning

Skeleton-based action recognition aims to project skeleton sequences to action categories, where skeleton sequences are derived from multiple forms of pre-detected points. Compared with earlier methods that focus on exploring single-form…

Computer Vision and Pattern Recognition · Computer Science 2022-07-01 Xuanhan Wang , Yan Dai , Lianli Gao , Jingkuan Song

An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition

Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training. Previous research has focused on aligning sequences' visual and semantic spatial…

Computer Vision and Pattern Recognition · Computer Science 2024-06-04 Haojun Xu , Yan Gao , Jie Li , Xinbo Gao

Universal Skeleton Understanding via Differentiable Rendering and MLLMs

Multimodal large language models (MLLMs) exhibit strong visual-language reasoning, yet cannot process structured, non-visual data such as human skeletons. Existing methods either compress skeleton dynamics into lossy feature vectors for…

Computer Vision and Pattern Recognition · Computer Science 2026-05-22 Ziyi Wang , Peiming Li , Xinshun Wang , Yang Tang , Kai-Kuang Ma , Mengyuan Liu

Multi-Modality Collaborative Learning for Sentiment Analysis

Multimodal sentiment analysis (MSA) identifies individuals' sentiment states in videos by integrating visual, audio, and text modalities. Despite progress in existing methods, the inherent modality heterogeneity limits the effective capture…

Machine Learning · Computer Science 2025-12-19 Shanmin Wang , Chengguang Liu , Qingshan Liu

MS-CLR: Multi-Skeleton Contrastive Learning for Human Action Recognition

Contrastive learning has gained significant attention in skeleton-based action recognition for its ability to learn robust representations from unlabeled data. However, existing methods rely on a single skeleton convention, which limits…

Computer Vision and Pattern Recognition · Computer Science 2025-08-21 Mert Kiray , Alvaro Ritter , Nassir Navab , Benjamin Busam

LLM Enhanced Action Recognition via Hierarchical Global-Local Skeleton-Language Model

Skeleton-based human action recognition has achieved remarkable progress in recent years. However, most existing GCN-based methods rely on short-range motion topologies, which not only struggle to capture long-range joint dependencies and…

Computer Vision and Pattern Recognition · Computer Science 2026-03-31 Ruosi Wang , Fangwei Zuo , Lei Li , Zhaoqiang Xia

Skeleton Sequence and RGB Frame Based Multi-Modality Feature Fusion Network for Action Recognition

Action recognition has been a heated topic in computer vision for its wide application in vision systems. Previous approaches achieve improvement by fusing the modalities of the skeleton sequence and RGB video. However, such methods have a…

Computer Vision and Pattern Recognition · Computer Science 2022-02-24 Xiaoguang Zhu , Ye Zhu , Haoyu Wang , Honglin Wen , Yan Yan , Peilin Liu

Modality Compensation Network: Cross-Modal Adaptation for Action Recognition

With the prevalence of RGB-D cameras, multi-modal video data have become more available for human action recognition. One main challenge for this task lies in how to effectively leverage their complementary information. In this work, we…

Computer Vision and Pattern Recognition · Computer Science 2020-02-03 Sijie Song , Jiaying Liu , Yanghao Li , Zongming Guo

MS$^2$L: Multi-Task Self-Supervised Learning for Skeleton Based Action Recognition

In this paper, we address self-supervised representation learning from human skeletons for action recognition. Previous methods, which usually learn feature presentations from a single reconstruction task, may come across the overfitting…

Computer Vision and Pattern Recognition · Computer Science 2020-10-15 Lilang Lin , Sijie Song , Wenhan Yan , Jiaying Liu

Skeleton Aware Multi-modal Sign Language Recognition

Sign language is commonly used by deaf or speech impaired people to communicate but requires significant effort to master. Sign Language Recognition (SLR) aims to bridge the gap between sign language users and others by recognizing signs…

Computer Vision and Pattern Recognition · Computer Science 2021-05-04 Songyao Jiang , Bin Sun , Lichen Wang , Yue Bai , Kunpeng Li , Yun Fu

RCMCL: A Unified Contrastive Learning Framework for Robust Multi-Modal (RGB-D, Skeleton, Point Cloud) Action Understanding

Human action recognition (HAR) with multi-modal inputs (RGB-D, skeleton, point cloud) can achieve high accuracy but typically relies on large labeled datasets and degrades sharply when sensors fail or are noisy. We present Robust…

Signal Processing · Electrical Eng. & Systems 2025-11-18 Hasan Akgul , Mari Eplik , Javier Rojas , Akira Yamamoto , Rajesh Kumar , Maya Singh

Multimodal Skeleton-Based Action Representation Learning via Decomposition and Composition

Multimodal human action understanding is a significant problem in computer vision, with the central challenge being the effective utilization of the complementarity among diverse modalities while maintaining model efficiency. However, most…

Computer Vision and Pattern Recognition · Computer Science 2026-03-11 Hongsong Wang , Heng Fei , Bingxuan Dai , Jie Gui

Foundation Model for Skeleton-Based Human Action Understanding

Human action understanding serves as a foundational pillar in the field of intelligent motion perception. Skeletons serve as a modality- and device-agnostic representation for human modeling, and skeleton-based action understanding has…

Computer Vision and Pattern Recognition · Computer Science 2025-08-19 Hongsong Wang , Wanjiang Weng , Junbo Wang , Fang Zhao , Guo-Sen Xie , Xin Geng , Liang Wang

Skeleton-Based Action Recognition with Spatial Reasoning and Temporal Stack Learning

Skeleton-based action recognition has made great progress recently, but many problems still remain unsolved. For example, most of the previous methods model the representations of skeleton sequences without abundant spatial structure…

Computer Vision and Pattern Recognition · Computer Science 2018-12-04 Chenyang Si , Ya Jing , Wei Wang , Liang Wang , Tieniu Tan

Sign Language Recognition via Skeleton-Aware Multi-Model Ensemble

Sign language is commonly used by deaf or mute people to communicate but requires extensive effort to master. It is usually performed with the fast yet delicate movement of hand gestures, body posture, and even facial expressions. Current…

Computer Vision and Pattern Recognition · Computer Science 2021-10-13 Songyao Jiang , Bin Sun , Lichen Wang , Yue Bai , Kunpeng Li , Yun Fu

Skeleton-Based Action Recognition with Synchronous Local and Non-local Spatio-temporal Learning and Frequency Attention

Benefiting from its succinctness and robustness, skeleton-based action recognition has recently attracted much attention. Most existing methods utilize local networks (e.g., recurrent, convolutional, and graph convolutional networks) to…

Computer Vision and Pattern Recognition · Computer Science 2019-06-13 Guyue Hu , Bo Cui , Shan Yu

Unified Multi-modal Unsupervised Representation Learning for Skeleton-based Action Understanding

Unsupervised pre-training has shown great success in skeleton-based action understanding recently. Existing works typically train separate modality-specific models, then integrate the multi-modal information for action understanding by a…

Computer Vision and Pattern Recognition · Computer Science 2023-11-07 Shengkai Sun , Daizong Liu , Jianfeng Dong , Xiaoye Qu , Junyu Gao , Xun Yang , Xun Wang , Meng Wang

Skeleton Focused Human Activity Recognition in RGB Video

The data-driven approach that learns an optimal representation of vision features like skeleton frames or RGB videos is currently a dominant paradigm for activity recognition. While great improvements have been achieved from existing single…

Computer Vision and Pattern Recognition · Computer Science 2020-04-30 Bruce X. B. Yu , Yan Liu , Keith C. C. Chan

Improving Skeleton-based Action Recognitionwith Robust Spatial and Temporal Features

Recently skeleton-based action recognition has made signif-icant progresses in the computer vision community. Most state-of-the-art algorithms are based on Graph Convolutional Networks (GCN), andtarget at improving the network structure of…

Computer Vision and Pattern Recognition · Computer Science 2020-08-04 Zeshi Yang , Kangkang Yin

Motion Matters: Motion-guided Modulation Network for Skeleton-based Micro-Action Recognition

Micro-Actions (MAs) are an important form of non-verbal communication in social interactions, with potential applications in human emotional analysis. However, existing methods in Micro-Action Recognition often overlook the inherent subtle…

Computer Vision and Pattern Recognition · Computer Science 2025-12-01 Jihao Gu , Kun Li , Fei Wang , Yanyan Wei , Zhiliang Wu , Hehe Fan , Meng Wang