Related papers: Intelligent 3D Network Protocol for Multimedia Dat…
Despite the steady progress in video analysis led by the adoption of convolutional neural networks (CNNs), the relative improvement has been less drastic as that in 2D static image classification. Three main challenges exist including…
Infrared (IR) imaging has the potential to enable more robust action recognition systems compared to visible spectrum cameras due to lower sensitivity to lighting conditions and appearance variability. While the action recognition task on…
The video and action classification have extremely evolved by deep neural networks specially with two stream CNN using RGB and optical flow as inputs and they present outstanding performance in terms of video analysis. One of the…
Human action recognition in videos is a critical task with significant implications for numerous applications, including surveillance, sports analytics, and healthcare. The challenge lies in creating models that are both precise in their…
We propose a novel scheme for human action recognition in videos, using a 3-dimensional Convolutional Neural Network (3D CNN) based classifier. Traditionally in deep learning based human activity recognition approaches, either a few random…
Convolutional neural networks (CNNs) have been extensively applied for image recognition problems giving state-of-the-art results on recognition, detection, segmentation and retrieval. In this work we propose and evaluate several deep…
How can we collect and use a video dataset to further improve spatiotemporal 3D Convolutional Neural Networks (3D CNNs)? In order to positively answer this open question in video recognition, we have conducted an exploration study using a…
Effective processing of video input is essential for the recognition of temporally varying events such as human actions. Motivated by the often distinctive temporal characteristics of actions in either horizontal or vertical direction, we…
We conduct an in-depth exploration of different strategies for doing event detection in videos using convolutional neural networks (CNNs) trained for image classification. We study different ways of performing spatial and temporal pooling,…
Video-based behavior recognition is essential in fields such as public safety, intelligent surveillance, and human-computer interaction. Traditional 3D Convolutional Neural Network (3D CNN) effectively capture local spatiotemporal features…
In recent years, a number of approaches based on 2D or 3D convolutional neural networks (CNN) have emerged for video action recognition, achieving state-of-the-art results on several large-scale benchmark datasets. In this paper, we carry…
Deep learning has been demonstrated to achieve excellent results for image classification and object detection. However, the impact of deep learning on video analysis (e.g. action detection and recognition) has been limited due to…
It remains a challenge to efficiently extract spatialtemporal information from skeleton sequences for 3D human action recognition. Although most recent action recognition methods are based on Recurrent Neural Networks which present…
Deep convolutional networks are widely used in video action recognition. 3D convolutions are one prominent approach to deal with the additional time dimension. While 3D convolutions typically lead to higher accuracies, the inner workings of…
Despite the success in still image recognition, deep neural networks for spatiotemporal signal tasks (such as human action recognition in videos) still suffers from low efficacy and inefficiency over the past years. Recently, human experts…
Classifying videos according to content semantics is an important problem with a wide range of applications. In this paper, we propose a hybrid deep learning framework for video classification, which is able to model static spatial…
Videos are inherently multimodal. This paper studies the problem of how to fully exploit the abundant multimodal clues for improved video categorization. We introduce a hybrid deep learning framework that integrates useful clues from…
Human action recognition is one of the challenging tasks in computer vision. The current action recognition methods use computationally expensive models for learning spatio-temporal dependencies of the action. Models utilizing RGB channels…
Human actions in video sequences are three-dimensional (3D) spatio-temporal signals characterizing both the visual appearance and motion dynamics of the involved humans and objects. Inspired by the success of convolutional neural networks…
The purpose of this study is to determine whether current video datasets have sufficient data for training very deep convolutional neural networks (CNNs) with spatio-temporal three-dimensional (3D) kernels. Recently, the performance levels…