Related papers: Grouped Spatial-Temporal Aggregation for Efficient…

Collaborative Spatio-temporal Feature Learning for Video Action Recognition

Spatio-temporal feature learning is of central importance for action recognition in videos. Existing deep neural network models either learn spatial and temporal features independently (C2D) or jointly with unconstrained parameters (C3D).…

Computer Vision and Pattern Recognition · Computer Science 2019-03-05 Chao Li , Qiaoyong Zhong , Di Xie , Shiliang Pu

GTAD: Global Temporal Aggregation Denoising Learning for 3D Semantic Occupancy Prediction

Accurately perceiving dynamic environments is a fundamental task for autonomous driving and robotic systems. Existing methods inadequately utilize temporal information, relying mainly on local temporal interactions between adjacent frames…

Computer Vision and Pattern Recognition · Computer Science 2025-07-29 Tianhao Li , Yang Li , Mengtian Li , Yisheng Deng , Weifeng Ge

Skeleton-based Action Recognition via Temporal-Channel Aggregation

Skeleton-based action recognition methods are limited by the semantic extraction of spatio-temporal skeletal maps. However, current methods have difficulty in effectively combining features from both temporal and spatial graph dimensions…

Computer Vision and Pattern Recognition · Computer Science 2022-08-09 Shengqin Wang , Yongji Zhang , Minghao Zhao , Hong Qi , Kai Wang , Fenglin Wei , Yu Jiang

Segmental Spatiotemporal CNNs for Fine-grained Action Segmentation

Joint segmentation and classification of fine-grained actions is important for applications of human-robot interaction, video surveillance, and human skill evaluation. However, despite substantial recent progress in large-scale action…

Computer Vision and Pattern Recognition · Computer Science 2016-10-03 Colin Lea , Austin Reiter , Rene Vidal , Gregory D. Hager

STH: Spatio-Temporal Hybrid Convolution for Efficient Action Recognition

Effective and Efficient spatio-temporal modeling is essential for action recognition. Existing methods suffer from the trade-off between model performance and model complexity. In this paper, we present a novel Spatio-Temporal Hybrid…

Computer Vision and Pattern Recognition · Computer Science 2020-03-19 Xu Li , Jingwen Wang , Lin Ma , Kaihao Zhang , Fengzong Lian , Zhanhui Kang , Jinjun Wang

Group Contextualization for Video Recognition

Learning discriminative representation from the complex spatio-temporal dynamic space is essential for video recognition. On top of those stylized spatio-temporal computational units, further refining the learnt feature with axial contexts…

Computer Vision and Pattern Recognition · Computer Science 2022-03-21 Yanbin Hao , Hao Zhang , Chong-Wah Ngo , Xiangnan He

A Decoupled Spatio-Temporal Framework for Skeleton-based Action Segmentation

Effectively modeling discriminative spatio-temporal information is essential for segmenting activities in long action sequences. However, we observe that existing methods are limited in weak spatio-temporal modeling capability due to two…

Computer Vision and Pattern Recognition · Computer Science 2023-12-12 Yunheng Li , Zhongyu Li , Shanghua Gao , Qilong Wang , Qibin Hou , Ming-Ming Cheng

Temporal Convolutional Networks for Action Segmentation and Detection

The ability to identify and temporally segment fine-grained human actions throughout a video is crucial for robotics, surveillance, education, and beyond. Typical approaches decouple this problem by first extracting local spatiotemporal…

Computer Vision and Pattern Recognition · Computer Science 2016-11-17 Colin Lea , Michael D. Flynn , Rene Vidal , Austin Reiter , Gregory D. Hager

Temporal Reasoning Graph for Activity Recognition

Despite great success has been achieved in activity analysis, it still has many challenges. Most existing work in activity recognition pay more attention to design efficient architecture or video sampling strategy. However, due to the…

Computer Vision and Pattern Recognition · Computer Science 2019-08-28 Jingran Zhang , Fumin Shen , Xing Xu , Heng Tao Shen

STM: SpatioTemporal and Motion Encoding for Action Recognition

Spatiotemporal and motion features are two complementary and crucial information for video action recognition. Recent state-of-the-art methods adopt a 3D CNN stream to learn spatiotemporal features and another flow stream to learn motion…

Computer Vision and Pattern Recognition · Computer Science 2019-08-19 Boyuan Jiang , Mengmeng Wang , Weihao Gan , Wei Wu , Junjie Yan

Spatio-Temporal FAST 3D Convolutions for Human Action Recognition

Effective processing of video input is essential for the recognition of temporally varying events such as human actions. Motivated by the often distinctive temporal characteristics of actions in either horizontal or vertical direction, we…

Computer Vision and Pattern Recognition · Computer Science 2020-06-24 Alexandros Stergiou , Ronald Poppe

STFCN: Spatio-Temporal FCN for Semantic Video Segmentation

This paper presents a novel method to involve both spatial and temporal features for semantic video segmentation. Current work on convolutional neural networks(CNNs) has shown that CNNs provide advanced spatial features supporting a very…

Computer Vision and Pattern Recognition · Computer Science 2016-09-05 Mohsen Fayyaz , Mohammad Hajizadeh Saffar , Mohammad Sabokrou , Mahmood Fathy , Reinhard Klette , Fay Huang

Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation

Language-queried video actor segmentation aims to predict the pixel-level mask of the actor which performs the actions described by a natural language query in the target frames. Existing methods adopt 3D CNNs over the video clip as a…

Computer Vision and Pattern Recognition · Computer Science 2021-05-17 Tianrui Hui , Shaofei Huang , Si Liu , Zihan Ding , Guanbin Li , Wenguan Wang , Jizhong Han , Fei Wang

Comparative Analysis of CNN-based Spatiotemporal Reasoning in Videos

Understanding actions and gestures in video streams requires temporal reasoning of the spatial content from different time instants, i.e., spatiotemporal (ST) modeling. In this survey paper, we have made a comparative analysis of different…

Computer Vision and Pattern Recognition · Computer Science 2021-01-12 Okan Köpüklü , Fabian Herzog , Gerhard Rigoll

Spatial-Temporal Perception with Causal Inference for Naturalistic Driving Action Recognition

Naturalistic driving action recognition is essential for vehicle cabin monitoring systems. However, the complexity of real-world backgrounds presents significant challenges for this task, and previous approaches have struggled with…

Computer Vision and Pattern Recognition · Computer Science 2025-03-07 Qing Chang , Wei Dai , Zhihao Shuai , Limin Yu , Yutao Yue

StNet: Local and Global Spatial-Temporal Modeling for Action Recognition

Despite the success of deep learning for static image understanding, it remains unclear what are the most effective network architectures for the spatial-temporal modeling in videos. In this paper, in contrast to the existing CNN+RNN or…

Computer Vision and Pattern Recognition · Computer Science 2018-12-12 Dongliang He , Zhichao Zhou , Chuang Gan , Fu Li , Xiao Liu , Yandong Li , Limin Wang , Shilei Wen

Temporal Convolutional Networks: A Unified Approach to Action Segmentation

The dominant paradigm for video-based action segmentation is composed of two steps: first, for each frame, compute low-level features using Dense Trajectories or a Convolutional Neural Network that encode spatiotemporal information locally,…

Computer Vision and Pattern Recognition · Computer Science 2016-08-31 Colin Lea , Rene Vidal , Austin Reiter , Gregory D. Hager

Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition

Dynamics of human body skeletons convey significant information for human action recognition. Conventional approaches for modeling skeletons usually rely on hand-crafted parts or traversal rules, thus resulting in limited expressive power…

Computer Vision and Pattern Recognition · Computer Science 2018-01-26 Sijie Yan , Yuanjun Xiong , Dahua Lin

Temporal Distinct Representation Learning for Action Recognition

Motivated by the previous success of Two-Dimensional Convolutional Neural Network (2D CNN) on image recognition, researchers endeavor to leverage it to characterize videos. However, one limitation of applying 2D CNN to analyze videos is…

Computer Vision and Pattern Recognition · Computer Science 2020-07-16 Junwu Weng , Donghao Luo , Yabiao Wang , Ying Tai , Chengjie Wang , Jilin Li , Feiyue Huang , Xudong Jiang , Junsong Yuan

Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition

Spatial-temporal graphs have been widely used by skeleton-based action recognition algorithms to model human action dynamics. To capture robust movement patterns from these graphs, long-range and multi-scale context aggregation and…

Computer Vision and Pattern Recognition · Computer Science 2020-05-20 Ziyu Liu , Hongwen Zhang , Zhenghao Chen , Zhiyong Wang , Wanli Ouyang