Related papers: CTM: Collaborative Temporal Modeling for Action Re…

Motion-driven Visual Tempo Learning for Video-based Action Recognition

Action visual tempo characterizes the dynamics and the temporal scale of an action, which is helpful to distinguish human actions that share high similarities in visual dynamics and appearance. Previous methods capture the visual tempo…

Computer Vision and Pattern Recognition · Computer Science 2022-07-13 Yuanzhong Liu , Junsong Yuan , Zhigang Tu

STM: SpatioTemporal and Motion Encoding for Action Recognition

Spatiotemporal and motion features are two complementary and crucial information for video action recognition. Recent state-of-the-art methods adopt a 3D CNN stream to learn spatiotemporal features and another flow stream to learn motion…

Computer Vision and Pattern Recognition · Computer Science 2019-08-19 Boyuan Jiang , Mengmeng Wang , Weihao Gan , Wei Wu , Junjie Yan

TDN: Temporal Difference Networks for Efficient Action Recognition

Temporal modeling still remains challenging for action recognition in videos. To mitigate this issue, this paper presents a new video architecture, termed as Temporal Difference Network (TDN), with a focus on capturing multi-scale temporal…

Computer Vision and Pattern Recognition · Computer Science 2021-04-02 Limin Wang , Zhan Tong , Bin Ji , Gangshan Wu

Collaborative Spatio-temporal Feature Learning for Video Action Recognition

Spatio-temporal feature learning is of central importance for action recognition in videos. Existing deep neural network models either learn spatial and temporal features independently (C2D) or jointly with unconstrained parameters (C3D).…

Computer Vision and Pattern Recognition · Computer Science 2019-03-05 Chao Li , Qiaoyong Zhong , Di Xie , Shiliang Pu

TAM: Temporal Adaptive Module for Video Recognition

Video data is with complex temporal dynamics due to various factors such as camera motion, speed variation, and different activities. To effectively capture this diverse motion pattern, this paper presents a new temporal adaptive module…

Computer Vision and Pattern Recognition · Computer Science 2021-08-19 Zhaoyang Liu , Limin Wang , Wayne Wu , Chen Qian , Tong Lu

Temporal Convolutional Networks for Action Segmentation and Detection

The ability to identify and temporally segment fine-grained human actions throughout a video is crucial for robotics, surveillance, education, and beyond. Typical approaches decouple this problem by first extracting local spatiotemporal…

Computer Vision and Pattern Recognition · Computer Science 2016-11-17 Colin Lea , Michael D. Flynn , Rene Vidal , Austin Reiter , Gregory D. Hager

Temporal-Spatial Mapping for Action Recognition

Deep learning models have enjoyed great success for image related computer vision tasks like image classification and object detection. For video related tasks like human action recognition, however, the advancements are not as significant…

Computer Vision and Pattern Recognition · Computer Science 2018-09-12 Xiaolin Song , Cuiling Lan , Wenjun Zeng , Junliang Xing , Jingyu Yang , Xiaoyan Sun

Multi-Temporal Convolutions for Human Action Recognition in Videos

Effective extraction of temporal patterns is crucial for the recognition of temporally varying actions in video. We argue that the fixed-sized spatio-temporal convolution kernels used in convolutional neural networks (CNNs) can be improved…

Computer Vision and Pattern Recognition · Computer Science 2021-04-01 Alexandros Stergiou , Ronald Poppe

STSM: Spatio-Temporal Shift Module for Efficient Action Recognition

The modeling, computational cost, and accuracy of traditional Spatio-temporal networks are the three most concentrated research topics in video action recognition. The traditional 2D convolution has a low computational cost, but it cannot…

Computer Vision and Pattern Recognition · Computer Science 2021-12-07 Zhaoqilin Yang , Gaoyun An

Temporal Convolutional Networks: A Unified Approach to Action Segmentation

The dominant paradigm for video-based action segmentation is composed of two steps: first, for each frame, compute low-level features using Dense Trajectories or a Convolutional Neural Network that encode spatiotemporal information locally,…

Computer Vision and Pattern Recognition · Computer Science 2016-08-31 Colin Lea , Rene Vidal , Austin Reiter , Gregory D. Hager

STFCN: Spatio-Temporal FCN for Semantic Video Segmentation

This paper presents a novel method to involve both spatial and temporal features for semantic video segmentation. Current work on convolutional neural networks(CNNs) has shown that CNNs provide advanced spatial features supporting a very…

Computer Vision and Pattern Recognition · Computer Science 2016-09-05 Mohsen Fayyaz , Mohammad Hajizadeh Saffar , Mohammad Sabokrou , Mahmood Fathy , Reinhard Klette , Fay Huang

Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification

Classifying videos according to content semantics is an important problem with a wide range of applications. In this paper, we propose a hybrid deep learning framework for video classification, which is able to model static spatial…

Computer Vision and Pattern Recognition · Computer Science 2015-04-08 Zuxuan Wu , Xi Wang , Yu-Gang Jiang , Hao Ye , Xiangyang Xue

Only Time Can Tell: Discovering Temporal Data for Temporal Modeling

Understanding temporal information and how the visual world changes over time is a fundamental ability of intelligent systems. In video understanding, temporal information is at the core of many current challenges, including compression,…

Computer Vision and Pattern Recognition · Computer Science 2019-10-31 Laura Sevilla-Lara , Shengxin Zha , Zhicheng Yan , Vedanuj Goswami , Matt Feiszli , Lorenzo Torresani

Learning Representative Temporal Features for Action Recognition

In this paper, a novel video classification method is presented that aims to recognize different categories of third-person videos efficiently. Our motivation is to achieve a light model that could be trained with insufficient training…

Computer Vision and Pattern Recognition · Computer Science 2022-03-29 Ali Javidani , Ahmad Mahmoudi-Aznaveh

Graph Convolutional Module for Temporal Action Localization in Videos

Temporal action localization has long been researched in computer vision. Existing state-of-the-art action localization methods divide each video into multiple action units (i.e., proposals in two-stage methods and segments in one-stage…

Computer Vision and Pattern Recognition · Computer Science 2021-12-02 Runhao Zeng , Wenbing Huang , Mingkui Tan , Yu Rong , Peilin Zhao , Junzhou Huang , Chuang Gan

Long-term Temporal Convolutions for Action Recognition

Typical human actions last several seconds and exhibit characteristic spatio-temporal structure. Recent methods attempt to capture this structure and learn action representations with convolutional neural networks. Such representations,…

Computer Vision and Pattern Recognition · Computer Science 2017-06-05 Gül Varol , Ivan Laptev , Cordelia Schmid

Comparison of Spatiotemporal Networks for Learning Video Related Tasks

Many methods for learning from video sequences involve temporally processing 2D CNN features from the individual frames or directly utilizing 3D convolutions within high-performing 2D CNN architectures. The focus typically remains on how to…

Computer Vision and Pattern Recognition · Computer Science 2020-09-17 Logan Courtney , Ramavarapu Sreenivas

TSM: Temporal Shift Module for Efficient Video Understanding

The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost. Conventional 2D CNNs are computationally cheap but cannot capture temporal relationships; 3D CNN…

Computer Vision and Pattern Recognition · Computer Science 2019-08-23 Ji Lin , Chuang Gan , Song Han

Temporal Segment Networks for Action Recognition in Videos

Deep convolutional networks have achieved great success for image recognition. However, for action recognition in videos, their advantage over traditional methods is not so evident. We present a general and flexible video-level framework…

Computer Vision and Pattern Recognition · Computer Science 2017-05-09 Limin Wang , Yuanjun Xiong , Zhe Wang , Yu Qiao , Dahua Lin , Xiaoou Tang , Luc Van Gool

Leveraging Temporal Contextualization for Video Action Recognition

We propose a novel framework for video understanding, called Temporally Contextualized CLIP (TC-CLIP), which leverages essential temporal information through global interactions in a spatio-temporal domain within a video. To be specific, we…

Computer Vision and Pattern Recognition · Computer Science 2024-07-25 Minji Kim , Dongyoon Han , Taekyung Kim , Bohyung Han