Related papers: Learning Representative Temporal Features for Acti…

Temporal Distinct Representation Learning for Action Recognition

Motivated by the previous success of Two-Dimensional Convolutional Neural Network (2D CNN) on image recognition, researchers endeavor to leverage it to characterize videos. However, one limitation of applying 2D CNN to analyze videos is…

Computer Vision and Pattern Recognition · Computer Science 2020-07-16 Junwu Weng , Donghao Luo , Yabiao Wang , Ying Tai , Chengjie Wang , Jilin Li , Feiyue Huang , Xudong Jiang , Junsong Yuan

Video Representation Learning by Recognizing Temporal Transformations

We introduce a novel self-supervised learning approach to learn representations of videos that are responsive to changes in the motion dynamics. Our representations can be learned from data without human annotation and provide a substantial…

Computer Vision and Pattern Recognition · Computer Science 2020-07-22 Simon Jenni , Givi Meishvili , Paolo Favaro

An Information-rich Sampling Technique over Spatio-Temporal CNN for Classification of Human Actions in Videos

We propose a novel scheme for human action recognition in videos, using a 3-dimensional Convolutional Neural Network (3D CNN) based classifier. Traditionally in deep learning based human activity recognition approaches, either a few random…

Computer Vision and Pattern Recognition · Computer Science 2020-02-10 S. H. Shabbeer Basha , Viswanath Pulabaigari , Snehasis Mukherjee

Describing Videos by Exploiting Temporal Structure

Recent progress in using recurrent neural networks (RNNs) for image description has motivated the exploration of their application for video description. However, while images are static, working with videos requires modeling their dynamic…

Machine Learning · Statistics 2015-10-02 Li Yao , Atousa Torabi , Kyunghyun Cho , Nicolas Ballas , Christopher Pal , Hugo Larochelle , Aaron Courville

Self-supervised Temporal Discriminative Learning for Video Representation Learning

Temporal cues in videos provide important information for recognizing actions accurately. However, temporal-discriminative features can hardly be extracted without using an annotated large-scale video action dataset for training. This paper…

Computer Vision and Pattern Recognition · Computer Science 2020-08-06 Jinpeng Wang , Yiqi Lin , Andy J. Ma , Pong C. Yuen

Long-term Temporal Convolutions for Action Recognition

Typical human actions last several seconds and exhibit characteristic spatio-temporal structure. Recent methods attempt to capture this structure and learn action representations with convolutional neural networks. Such representations,…

Computer Vision and Pattern Recognition · Computer Science 2017-06-05 Gül Varol , Ivan Laptev , Cordelia Schmid

Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification

The work in this paper is driven by the question how to exploit the temporal cues available in videos for their accurate classification, and for human action recognition in particular? Thus far, the vision community has focused on…

Computer Vision and Pattern Recognition · Computer Science 2017-11-23 Ali Diba , Mohsen Fayyaz , Vivek Sharma , Amir Hossein Karami , Mohammad Mahdi Arzani , Rahman Yousefzadeh , Luc Van Gool

A Closer Look at Spatiotemporal Convolutions for Action Recognition

In this paper we discuss several forms of spatiotemporal convolutions for video analysis and study their effects on action recognition. Our motivation stems from the observation that 2D CNNs applied to individual frames of the video have…

Computer Vision and Pattern Recognition · Computer Science 2018-04-13 Du Tran , Heng Wang , Lorenzo Torresani , Jamie Ray , Yann LeCun , Manohar Paluri

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification

Despite the steady progress in video analysis led by the adoption of convolutional neural networks (CNNs), the relative improvement has been less drastic as that in 2D static image classification. Three main challenges exist including…

Computer Vision and Pattern Recognition · Computer Science 2018-07-30 Saining Xie , Chen Sun , Jonathan Huang , Zhuowen Tu , Kevin Murphy

A Correlation Based Feature Representation for First-Person Activity Recognition

In this paper, a simple yet efficient activity recognition method for first-person video is introduced. The proposed method is appropriate for representation of high-dimensional features such as those extracted from convolutional neural…

Computer Vision and Pattern Recognition · Computer Science 2019-04-12 Reza Kahani , Alireza Talebpour , Ahmad Mahmoudi-Aznaveh

Efficient Modelling Across Time of Human Actions and Interactions

This thesis focuses on video understanding for human action and interaction recognition. We start by identifying the main challenges related to action recognition from videos and review how they have been addressed by current methods. Based…

Computer Vision and Pattern Recognition · Computer Science 2021-10-06 Alexandros Stergiou

Collaborative Spatio-temporal Feature Learning for Video Action Recognition

Spatio-temporal feature learning is of central importance for action recognition in videos. Existing deep neural network models either learn spatial and temporal features independently (C2D) or jointly with unconstrained parameters (C3D).…

Computer Vision and Pattern Recognition · Computer Science 2019-03-05 Chao Li , Qiaoyong Zhong , Di Xie , Shiliang Pu

Capturing Temporal Information in a Single Frame: Channel Sampling Strategies for Action Recognition

We address the problem of capturing temporal information for video classification in 2D networks, without increasing their computational cost. Existing approaches focus on modifying the architecture of 2D networks (e.g. by including filters…

Computer Vision and Pattern Recognition · Computer Science 2022-10-11 Kiyoon Kim , Shreyank N Gowda , Oisin Mac Aodha , Laura Sevilla-Lara

3D Convolutional with Attention for Action Recognition

Human action recognition is one of the challenging tasks in computer vision. The current action recognition methods use computationally expensive models for learning spatio-temporal dependencies of the action. Models utilizing RGB channels…

Computer Vision and Pattern Recognition · Computer Science 2022-06-07 Labina Shrestha , Shikha Dubey , Farrukh Olimov , Muhammad Aasim Rafique , Moongu Jeon

Class Feature Pyramids for Video Explanation

Deep convolutional networks are widely used in video action recognition. 3D convolutions are one prominent approach to deal with the additional time dimension. While 3D convolutions typically lead to higher accuracies, the inner workings of…

Computer Vision and Pattern Recognition · Computer Science 2020-06-24 Alexandros Stergiou , Georgios Kapidis , Grigorios Kalliatakis , Christos Chrysoulas , Ronald Poppe , Remco Veltkamp

Visual Attribute-augmented Three-dimensional Convolutional Neural Network for Enhanced Human Action Recognition

Visual attributes in individual video frames, such as the presence of characteristic objects and scenes, offer substantial information for action recognition in videos. With individual 2D video frame as input, visual attributes extraction…

Computer Vision and Pattern Recognition · Computer Science 2018-05-09 Yunfeng Wang , Wengang Zhou , Qilin Zhang , Houqiang Li

Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition

In recent years, a number of approaches based on 2D or 3D convolutional neural networks (CNN) have emerged for video action recognition, achieving state-of-the-art results on several large-scale benchmark datasets. In this paper, we carry…

Computer Vision and Pattern Recognition · Computer Science 2021-03-30 Chun-Fu Chen , Rameswar Panda , Kandan Ramakrishnan , Rogerio Feris , John Cohn , Aude Oliva , Quanfu Fan

Temporal Convolutional Networks: A Unified Approach to Action Segmentation

The dominant paradigm for video-based action segmentation is composed of two steps: first, for each frame, compute low-level features using Dense Trajectories or a Convolutional Neural Network that encode spatiotemporal information locally,…

Computer Vision and Pattern Recognition · Computer Science 2016-08-31 Colin Lea , Rene Vidal , Austin Reiter , Gregory D. Hager

TDN: Temporal Difference Networks for Efficient Action Recognition

Temporal modeling still remains challenging for action recognition in videos. To mitigate this issue, this paper presents a new video architecture, termed as Temporal Difference Network (TDN), with a focus on capturing multi-scale temporal…

Computer Vision and Pattern Recognition · Computer Science 2021-04-02 Limin Wang , Zhan Tong , Bin Ji , Gangshan Wu

Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics

We address the problem of video representation learning without human-annotated labels. While previous efforts address the problem by designing novel self-supervised tasks using video data, the learned features are merely on a…

Computer Vision and Pattern Recognition · Computer Science 2019-04-09 Jiangliu Wang , Jianbo Jiao , Linchao Bao , Shengfeng He , Yunhui Liu , Wei Liu