Related papers: Feature-Supervised Action Modality Transfer

Cross-modal knowledge distillation for action recognition

In this work, we address the problem how a network for action recognition that has been trained on a modality like RGB videos can be adapted to recognize actions for another modality like sequences of 3D human poses. To this end, we extract…

Computer Vision and Pattern Recognition · Computer Science 2019-10-11 Fida Mohammad Thoker , Juergen Gall

Students taught by multimodal teachers are superior action recognizers

The focal point of egocentric video understanding is modelling hand-object interactions. Standard models -- CNNs, Vision Transformers, etc. -- which receive RGB frames as input perform well, however, their performance improves further by…

Computer Vision and Pattern Recognition · Computer Science 2022-10-11 Gorjan Radevski , Dusan Grujicic , Matthew Blaschko , Marie-Francine Moens , Tinne Tuytelaars

Modality Distillation with Multiple Stream Networks for Action Recognition

Diverse input data modalities can provide complementary cues for several tasks, usually leading to more robust algorithms and better performance. However, while a (training) dataset could be accurately designed to include a variety of…

Computer Vision and Pattern Recognition · Computer Science 2018-10-30 Nuno Garcia , Pietro Morerio , Vittorio Murino

Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos

Single modality action recognition on RGB or depth sequences has been extensively explored recently. It is generally accepted that each of these two modalities has different strengths and limitations for the task of action recognition.…

Computer Vision and Pattern Recognition · Computer Science 2016-12-28 Amir Shahroudy , Tian-Tsong Ng , Yihong Gong , Gang Wang

Graph Distillation for Action Detection with Privileged Modalities

We propose a technique that tackles action detection in multimodal videos under a realistic and challenging condition in which only limited training data and partially observed modalities are available. Common methods in transfer learning…

Computer Vision and Pattern Recognition · Computer Science 2018-07-31 Zelun Luo , Jun-Ting Hsieh , Lu Jiang , Juan Carlos Niebles , Li Fei-Fei

Skeleton Focused Human Activity Recognition in RGB Video

The data-driven approach that learns an optimal representation of vision features like skeleton frames or RGB videos is currently a dominant paradigm for activity recognition. While great improvements have been achieved from existing single…

Computer Vision and Pattern Recognition · Computer Science 2020-04-30 Bruce X. B. Yu , Yan Liu , Keith C. C. Chan

XTrack: Multimodal Training Boosts RGB-X Video Object Trackers

Multimodal sensing has proven valuable for visual tracking, as different sensor types offer unique strengths in handling one specific challenging scene where object appearance varies. While a generalist model capable of leveraging all…

Computer Vision and Pattern Recognition · Computer Science 2024-12-02 Yuedong Tan , Zongwei Wu , Yuqian Fu , Zhuyun Zhou , Guolei Sun , Eduard Zamfi , Chao Ma , Danda Pani Paudel , Luc Van Gool , Radu Timofte

Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection

In video understanding, most cross-modal knowledge distillation (KD) methods are tailored for classification tasks, focusing on the discriminative representation of the trimmed videos. However, action detection requires not only…

Computer Vision and Pattern Recognition · Computer Science 2021-08-10 Rui Dai , Srijan Das , Francois Bremond

Cross Modal Distillation for Supervision Transfer

In this work we propose a technique that transfers supervision between images from different modalities. We use learned representations from a large labeled modality as a supervisory signal for training representations for a new unlabeled…

Computer Vision and Pattern Recognition · Computer Science 2015-11-26 Saurabh Gupta , Judy Hoffman , Jitendra Malik

Multi-Modal RGB-D Scene Recognition Across Domains

Scene recognition is one of the basic problems in computer vision research with extensive applications in robotics. When available, depth images provide helpful geometric cues that complement the RGB texture information and help to identify…

Computer Vision and Pattern Recognition · Computer Science 2021-09-08 Andrea Ferreri , Silvia Bucci , Tatiana Tommasi

Multi-Task and Multi-Modal Learning for RGB Dynamic Gesture Recognition

Gesture recognition is getting more and more popular due to various application possibilities in human-machine interaction. Existing multi-modal gesture recognition systems take multi-modal data as input to improve accuracy, but such…

Computer Vision and Pattern Recognition · Computer Science 2021-11-01 Dinghao Fan , Hengjie Lu , Shugong Xu , Shan Cao

Towards Robust Human Activity Recognition from RGB Video Stream with Limited Labeled Data

Human activity recognition based on video streams has received numerous attentions in recent years. Due to lack of depth information, RGB video based activity recognition performs poorly compared to RGB-D video based solutions. On the other…

Computer Vision and Pattern Recognition · Computer Science 2018-12-18 Krishanu Sarker , Mohamed Masoud , Saeid Belkasim , Shihao Ji

Unsupervised Learning of View-invariant Action Representations

The recent success in human action recognition with deep learning methods mostly adopt the supervised learning paradigm, which requires significant amount of manually labeled data to achieve good performance. However, label collection is an…

Computer Vision and Pattern Recognition · Computer Science 2018-09-07 Junnan Li , Yongkang Wong , Qi Zhao , Mohan S. Kankanhalli

Decomposed Cross-modal Distillation for RGB-based Temporal Action Detection

Temporal action detection aims to predict the time intervals and the classes of action instances in the video. Despite the promising performance, existing two-stream models exhibit slow inference speed due to their reliance on…

Computer Vision and Pattern Recognition · Computer Science 2023-03-31 Pilhyeon Lee , Taeoh Kim , Minho Shim , Dongyoon Wee , Hyeran Byun

Modality Compensation Network: Cross-Modal Adaptation for Action Recognition

With the prevalence of RGB-D cameras, multi-modal video data have become more available for human action recognition. One main challenge for this task lies in how to effectively leverage their complementary information. In this work, we…

Computer Vision and Pattern Recognition · Computer Science 2020-02-03 Sijie Song , Jiaying Liu , Yanghao Li , Zongming Guo

Feature Learning for Interaction Activity Recognition in RGBD Videos

This paper proposes a human activity recognition method which is based on features learned from 3D video data without incorporating domain knowledge. The experiments on data collected by RGBD cameras produce results outperforming other…

Computer Vision and Pattern Recognition · Computer Science 2015-08-11 Ngu Nguyen

Egocentric RGB+Depth Action Recognition in Industry-Like Settings

Action recognition from an egocentric viewpoint is a crucial perception task in robotics and enables a wide range of human-robot interactions. While most computer vision approaches prioritize the RGB camera, the Depth modality - which can…

Computer Vision and Pattern Recognition · Computer Science 2023-09-26 Jyoti Kini , Sarah Fleischer , Ishan Dave , Mubarak Shah

Real-time Action Recognition with Dissimilarity-based Training of Specialized Module Networks

This paper addresses the problem of real-time action recognition in trimmed videos, for which deep neural networks have defined the state-of-the-art performance in the recent literature. For attaining higher recognition accuracies with…

Computer Vision and Pattern Recognition · Computer Science 2018-10-30 Marian K. Y. Boktor , Ahmad Al-Kabbany , Radwa Khalil , Said El-Khamy

Learning to see across Domains and Modalities

Deep learning has raised hopes and expectations as a general solution for many applications; indeed it has proven effective, but it also showed a strong dependence on large quantities of data. Luckily, it has been shown that, even when data…

Computer Vision and Pattern Recognition · Computer Science 2019-02-14 Fabio Maria Carlucci

NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis

Recent approaches in depth-based human activity analysis achieved outstanding performance and proved the effectiveness of 3D representation for classification of action classes. Currently available depth-based and RGB+D-based action…

Computer Vision and Pattern Recognition · Computer Science 2016-04-12 Amir Shahroudy , Jun Liu , Tian-Tsong Ng , Gang Wang