Related papers: Intelligent 3D Network Protocol for Multimedia Dat…

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification

Despite the steady progress in video analysis led by the adoption of convolutional neural networks (CNNs), the relative improvement has been less drastic as that in 2D static image classification. Three main challenges exist including…

Computer Vision and Pattern Recognition · Computer Science 2018-07-30 Saining Xie , Chen Sun , Jonathan Huang , Zhuowen Tu , Kevin Murphy

Learning Spatiotemporal Features for Infrared Action Recognition with 3D Convolutional Neural Networks

Infrared (IR) imaging has the potential to enable more robust action recognition systems compared to visible spectrum cameras due to lower sensitivity to lighting conditions and appearance variability. While the action recognition task on…

Computer Vision and Pattern Recognition · Computer Science 2017-05-19 Zhuolin Jiang , Viktor Rozgic , Sancar Adali

Efficient Two-Stream Motion and Appearance 3D CNNs for Video Classification

The video and action classification have extremely evolved by deep neural networks specially with two stream CNN using RGB and optical flow as inputs and they present outstanding performance in terms of video analysis. One of the…

Computer Vision and Pattern Recognition · Computer Science 2016-09-05 Ali Diba , Ali Mohammad Pazandeh , Luc Van Gool

Deep Learning Approaches for Human Action Recognition in Video Data

Human action recognition in videos is a critical task with significant implications for numerous applications, including surveillance, sports analytics, and healthcare. The challenge lies in creating models that are both precise in their…

Computer Vision and Pattern Recognition · Computer Science 2024-03-12 Yufei Xie

An Information-rich Sampling Technique over Spatio-Temporal CNN for Classification of Human Actions in Videos

We propose a novel scheme for human action recognition in videos, using a 3-dimensional Convolutional Neural Network (3D CNN) based classifier. Traditionally in deep learning based human activity recognition approaches, either a few random…

Computer Vision and Pattern Recognition · Computer Science 2020-02-10 S. H. Shabbeer Basha , Viswanath Pulabaigari , Snehasis Mukherjee

Beyond Short Snippets: Deep Networks for Video Classification

Convolutional neural networks (CNNs) have been extensively applied for image recognition problems giving state-of-the-art results on recognition, detection, segmentation and retrieval. In this work we propose and evaluate several deep…

Computer Vision and Pattern Recognition · Computer Science 2015-04-14 Joe Yue-Hei Ng , Matthew Hausknecht , Sudheendra Vijayanarasimhan , Oriol Vinyals , Rajat Monga , George Toderici

Would Mega-scale Datasets Further Enhance Spatiotemporal 3D CNNs?

How can we collect and use a video dataset to further improve spatiotemporal 3D Convolutional Neural Networks (3D CNNs)? In order to positively answer this open question in video recognition, we have conducted an exploration study using a…

Computer Vision and Pattern Recognition · Computer Science 2020-04-13 Hirokatsu Kataoka , Tenga Wakamiya , Kensho Hara , Yutaka Satoh

Spatio-Temporal FAST 3D Convolutions for Human Action Recognition

Effective processing of video input is essential for the recognition of temporally varying events such as human actions. Motivated by the often distinctive temporal characteristics of actions in either horizontal or vertical direction, we…

Computer Vision and Pattern Recognition · Computer Science 2020-06-24 Alexandros Stergiou , Ronald Poppe

Exploiting Image-trained CNN Architectures for Unconstrained Video Classification

We conduct an in-depth exploration of different strategies for doing event detection in videos using convolutional neural networks (CNNs) trained for image classification. We study different ways of performing spatial and temporal pooling,…

Computer Vision and Pattern Recognition · Computer Science 2015-05-11 Shengxin Zha , Florian Luisier , Walter Andrews , Nitish Srivastava , Ruslan Salakhutdinov

A Framework Combining 3D CNN and Transformer for Video-Based Behavior Recognition

Video-based behavior recognition is essential in fields such as public safety, intelligent surveillance, and human-computer interaction. Traditional 3D Convolutional Neural Network (3D CNN) effectively capture local spatiotemporal features…

Computer Vision and Pattern Recognition · Computer Science 2025-08-12 Xiuliang Zhang , Tadiwa Elisha Nyamasvisva , Chuntao Liu

Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition

In recent years, a number of approaches based on 2D or 3D convolutional neural networks (CNN) have emerged for video action recognition, achieving state-of-the-art results on several large-scale benchmark datasets. In this paper, we carry…

Computer Vision and Pattern Recognition · Computer Science 2021-03-30 Chun-Fu Chen , Rameswar Panda , Kandan Ramakrishnan , Rogerio Feris , John Cohn , Aude Oliva , Quanfu Fan

Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos

Deep learning has been demonstrated to achieve excellent results for image classification and object detection. However, the impact of deep learning on video analysis (e.g. action detection and recognition) has been limited due to…

Computer Vision and Pattern Recognition · Computer Science 2017-08-03 Rui Hou , Chen Chen , Mubarak Shah

Two-Stream 3D Convolutional Neural Network for Skeleton-Based Action Recognition

It remains a challenge to efficiently extract spatialtemporal information from skeleton sequences for 3D human action recognition. Although most recent action recognition methods are based on Recurrent Neural Networks which present…

Computer Vision and Pattern Recognition · Computer Science 2017-06-08 Hong Liu , Juanhui Tu , Mengyuan Liu

Class Feature Pyramids for Video Explanation

Deep convolutional networks are widely used in video action recognition. 3D convolutions are one prominent approach to deal with the additional time dimension. While 3D convolutions typically lead to higher accuracies, the inner workings of…

Computer Vision and Pattern Recognition · Computer Science 2020-06-24 Alexandros Stergiou , Georgios Kapidis , Grigorios Kalliatakis , Christos Chrysoulas , Ronald Poppe , Remco Veltkamp

Spatiotemporal Fusion in 3D CNNs: A Probabilistic View

Despite the success in still image recognition, deep neural networks for spatiotemporal signal tasks (such as human action recognition in videos) still suffers from low efficacy and inefficiency over the past years. Recently, human experts…

Computer Vision and Pattern Recognition · Computer Science 2020-04-13 Yizhou Zhou , Xiaoyan Sun , Chong Luo , Zheng-Jun Zha , Wenjun Zeng

Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification

Classifying videos according to content semantics is an important problem with a wide range of applications. In this paper, we propose a hybrid deep learning framework for video classification, which is able to model static spatial…

Computer Vision and Pattern Recognition · Computer Science 2015-04-08 Zuxuan Wu , Xi Wang , Yu-Gang Jiang , Hao Ye , Xiangyang Xue

Modeling Multimodal Clues in a Hybrid Deep Learning Framework for Video Classification

Videos are inherently multimodal. This paper studies the problem of how to fully exploit the abundant multimodal clues for improved video categorization. We introduce a hybrid deep learning framework that integrates useful clues from…

Multimedia · Computer Science 2017-06-15 Yu-Gang Jiang , Zuxuan Wu , Jinhui Tang , Zechao Li , Xiangyang Xue , Shih-Fu Chang

3D Convolutional with Attention for Action Recognition

Human action recognition is one of the challenging tasks in computer vision. The current action recognition methods use computationally expensive models for learning spatio-temporal dependencies of the action. Models utilizing RGB channels…

Computer Vision and Pattern Recognition · Computer Science 2022-06-07 Labina Shrestha , Shikha Dubey , Farrukh Olimov , Muhammad Aasim Rafique , Moongu Jeon

Human Action Recognition using Factorized Spatio-Temporal Convolutional Networks

Human actions in video sequences are three-dimensional (3D) spatio-temporal signals characterizing both the visual appearance and motion dynamics of the involved humans and objects. Inspired by the success of convolutional neural networks…

Computer Vision and Pattern Recognition · Computer Science 2015-10-05 Lin Sun , Kui Jia , Dit-Yan Yeung , Bertram E. Shi

Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?

The purpose of this study is to determine whether current video datasets have sufficient data for training very deep convolutional neural networks (CNNs) with spatio-temporal three-dimensional (3D) kernels. Recently, the performance levels…

Computer Vision and Pattern Recognition · Computer Science 2018-04-03 Kensho Hara , Hirokatsu Kataoka , Yutaka Satoh