Related papers: Blockwise Temporal-Spatial Pathway Network

StNet: Local and Global Spatial-Temporal Modeling for Action Recognition

Despite the success of deep learning for static image understanding, it remains unclear what are the most effective network architectures for the spatial-temporal modeling in videos. In this paper, in contrast to the existing CNN+RNN or…

Computer Vision and Pattern Recognition · Computer Science 2018-12-12 Dongliang He , Zhichao Zhou , Chuang Gan , Fu Li , Xiao Liu , Yandong Li , Limin Wang , Shilei Wen

Temporal Segment Networks for Action Recognition in Videos

Deep convolutional networks have achieved great success for image recognition. However, for action recognition in videos, their advantage over traditional methods is not so evident. We present a general and flexible video-level framework…

Computer Vision and Pattern Recognition · Computer Science 2017-05-09 Limin Wang , Yuanjun Xiong , Zhe Wang , Yu Qiao , Dahua Lin , Xiaoou Tang , Luc Van Gool

Temporal Bilinear Networks for Video Action Recognition

Temporal modeling in videos is a fundamental yet challenging problem in computer vision. In this paper, we propose a novel Temporal Bilinear (TB) model to capture the temporal pairwise feature interactions between adjacent frames. Compared…

Computer Vision and Pattern Recognition · Computer Science 2018-11-27 Yanghao Li , Sijie Song , Yuqi Li , Jiaying Liu

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

Deep convolutional networks have achieved great success for visual recognition in still images. However, for action recognition in videos, the advantage over traditional methods is not so evident. This paper aims to discover the principles…

Computer Vision and Pattern Recognition · Computer Science 2016-08-03 Limin Wang , Yuanjun Xiong , Zhe Wang , Yu Qiao , Dahua Lin , Xiaoou Tang , Luc Van Gool

Temporal-Channel Topology Enhanced Network for Skeleton-Based Action Recognition

Skeleton-based action recognition has become popular in recent years due to its efficiency and robustness. Most current methods adopt graph convolutional network (GCN) for topology modeling, but GCN-based methods are limited in…

Computer Vision and Pattern Recognition · Computer Science 2023-02-28 Jinzhao Luo , Lu Zhou , Guibo Zhu , Guojing Ge , Beiying Yang , Jinqiao Wang

Spatio-Temporal FAST 3D Convolutions for Human Action Recognition

Effective processing of video input is essential for the recognition of temporally varying events such as human actions. Motivated by the often distinctive temporal characteristics of actions in either horizontal or vertical direction, we…

Computer Vision and Pattern Recognition · Computer Science 2020-06-24 Alexandros Stergiou , Ronald Poppe

Multi-Temporal Convolutions for Human Action Recognition in Videos

Effective extraction of temporal patterns is crucial for the recognition of temporally varying actions in video. We argue that the fixed-sized spatio-temporal convolution kernels used in convolutional neural networks (CNNs) can be improved…

Computer Vision and Pattern Recognition · Computer Science 2021-04-01 Alexandros Stergiou , Ronald Poppe

CTM: Collaborative Temporal Modeling for Action Recognition

With the rapid development of digital multimedia, video understanding has become an important field. For action recognition, temporal dimension plays an important role, and this is quite different from image recognition. In order to learn…

Computer Vision and Pattern Recognition · Computer Science 2020-02-11 Qian Liu , Tao Wang , Jie Liu , Yang Guan , Qi Bu , Longfei Yang

Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition

Convolutional neural networks with spatio-temporal 3D kernels (3D CNNs) have an ability to directly extract spatio-temporal features from videos for action recognition. Although the 3D kernels tend to overfit because of a large number of…

Computer Vision and Pattern Recognition · Computer Science 2017-08-28 Kensho Hara , Hirokatsu Kataoka , Yutaka Satoh

Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification

The work in this paper is driven by the question how to exploit the temporal cues available in videos for their accurate classification, and for human action recognition in particular? Thus far, the vision community has focused on…

Computer Vision and Pattern Recognition · Computer Science 2017-11-23 Ali Diba , Mohsen Fayyaz , Vivek Sharma , Amir Hossein Karami , Mohammad Mahdi Arzani , Rahman Yousefzadeh , Luc Van Gool

Leveraging Spatio-Temporal Dependency for Skeleton-Based Action Recognition

Skeleton-based action recognition has attracted considerable attention due to its compact representation of the human body's skeletal sructure. Many recent methods have achieved remarkable performance using graph convolutional networks…

Computer Vision and Pattern Recognition · Computer Science 2023-07-20 Jungho Lee , Minhyeok Lee , Suhwan Cho , Sungmin Woo , Sungjun Jang , Sangyoun Lee

Spatio-Temporal Channel Correlation Networks for Action Classification

The work in this paper is driven by the question if spatio-temporal correlations are enough for 3D convolutional neural networks (CNN)? Most of the traditional 3D networks use local spatio-temporal features. We introduce a new block that…

Computer Vision and Pattern Recognition · Computer Science 2019-02-08 Ali Diba , Mohsen Fayyaz , Vivek Sharma , M. Mahdi Arzani , Rahman Yousefzadeh , Juergen Gall , Luc Van Gool

Temporal Transformer Networks with Self-Supervision for Action Recognition

In recent years, 2D Convolutional Networks-based video action recognition has encouragingly gained wide popularity; However, constrained by the lack of long-range non-linear temporal relation modeling and reverse motion information…

Computer Vision and Pattern Recognition · Computer Science 2021-12-20 Yongkang Zhang , Jun Li , Guoming Wu , Han Zhang , Zhiping Shi , Zhaoxun Liu , Zizhang Wu

Coarse Temporal Attention Network (CTA-Net) for Driver's Activity Recognition

There is significant progress in recognizing traditional human activities from videos focusing on highly distinctive actions involving discriminative body movements, body-object and/or human-human interactions. Driver's activities are…

Computer Vision and Pattern Recognition · Computer Science 2021-01-19 Zachary Wharton , Ardhendu Behera , Yonghuai Liu , Nik Bessis

Spatio-Temporal Fusion Networks for Action Recognition

The video based CNN works have focused on effective ways to fuse appearance and motion networks, but they typically lack utilizing temporal information over video frames. In this work, we present a novel spatio-temporal fusion network…

Computer Vision and Pattern Recognition · Computer Science 2019-06-18 Sangwoo Cho , Hassan Foroosh

STM: SpatioTemporal and Motion Encoding for Action Recognition

Spatiotemporal and motion features are two complementary and crucial information for video action recognition. Recent state-of-the-art methods adopt a 3D CNN stream to learn spatiotemporal features and another flow stream to learn motion…

Computer Vision and Pattern Recognition · Computer Science 2019-08-19 Boyuan Jiang , Mengmeng Wang , Weihao Gan , Wei Wu , Junjie Yan

Spatio-Temporal Self-Attention Network for Video Saliency Prediction

3D convolutional neural networks have achieved promising results for video tasks in computer vision, including video saliency prediction that is explored in this paper. However, 3D convolution encodes visual representation merely on fixed…

Computer Vision and Pattern Recognition · Computer Science 2022-01-19 Ziqiang Wang , Zhi Liu , Gongyang Li , Yang Wang , Tianhong Zhang , Lihua Xu , Jijun Wang

STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action Recognition

Graph convolutional networks (GCNs) have been widely used and achieved remarkable results in skeleton-based action recognition. We think the key to skeleton-based action recognition is a skeleton hanging in frames, so we focus on how the…

Computer Vision and Pattern Recognition · Computer Science 2023-12-07 Nguyen Huu Bao Long

Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation

In this paper, we address the challenges in unsupervised video object segmentation (UVOS) by proposing an efficient algorithm, termed MTNet, which concurrently exploits motion and temporal cues. Unlike previous methods that focus solely on…

Computer Vision and Pattern Recognition · Computer Science 2025-01-15 Yunzhi Zhuge , Hongyu Gu , Lu Zhang , Jinqing Qi , Huchuan Lu

Video BagNet: short temporal receptive fields increase robustness in long-term action recognition

Previous work on long-term video action recognition relies on deep 3D-convolutional models that have a large temporal receptive field (RF). We argue that these models are not always the best choice for temporal modeling in videos. A large…

Computer Vision and Pattern Recognition · Computer Science 2023-08-23 Ombretta Strafforello , Xin Liu , Klamer Schutte , Jan van Gemert