Related papers: Action Recognition Using Temporal Shift Module and…

Multi-task Learning with Extended Temporal Shift Module for Temporal Action Localization

We present our solution to the BinEgo-360 Challenge at ICCV 2025, which focuses on temporal action localization (TAL) in multi-perspective and multi-modal video settings. The challenge provides a dataset containing panoramic, third-person,…

Computer Vision and Pattern Recognition · Computer Science 2025-12-15 Anh-Kiet Duong , Petra Gomez-Krämer

An Effective End-to-End Solution for Multimodal Action Recognition

Recently, multimodal tasks have strongly advanced the field of action recognition with their rich multimodal information. However, due to the scarcity of tri-modal data, research on tri-modal action recognition tasks faces many challenges.…

Computer Vision and Pattern Recognition · Computer Science 2025-06-12 Songping Wang , Xiantao Hu , Yueming Lyu , Caifeng Shan

Temporal Segment Networks for Action Recognition in Videos

Deep convolutional networks have achieved great success for image recognition. However, for action recognition in videos, their advantage over traditional methods is not so evident. We present a general and flexible video-level framework…

Computer Vision and Pattern Recognition · Computer Science 2017-05-09 Limin Wang , Yuanjun Xiong , Zhe Wang , Yu Qiao , Dahua Lin , Xiaoou Tang , Luc Van Gool

Toward Accurate Person-level Action Recognition in Videos of Crowded Scenes

Detecting and recognizing human action in videos with crowded scenes is a challenging problem due to the complex environment and diversity events. Prior works always fail to deal with this problem in two aspects: (1) lacking utilizing…

Computer Vision and Pattern Recognition · Computer Science 2020-10-19 Li Yuan , Yichen Zhou , Shuning Chang , Ziyuan Huang , Yunpeng Chen , Xuecheng Nie , Tao Wang , Jiashi Feng , Shuicheng Yan

Motion-driven Visual Tempo Learning for Video-based Action Recognition

Action visual tempo characterizes the dynamics and the temporal scale of an action, which is helpful to distinguish human actions that share high similarities in visual dynamics and appearance. Previous methods capture the visual tempo…

Computer Vision and Pattern Recognition · Computer Science 2022-07-13 Yuanzhong Liu , Junsong Yuan , Zhigang Tu

Ensembles of Deep Neural Networks for Action Recognition in Still Images

Despite the fact that notable improvements have been made recently in the field of feature extraction and classification, human action recognition is still challenging, especially in images, in which, unlike videos, there is no motion.…

Computer Vision and Pattern Recognition · Computer Science 2020-03-24 Sina Mohammadi , Sina Ghofrani Majelan , Shahriar B. Shokouhi

Mobile Video Action Recognition

Video action recognition, which is topical in computer vision and video analysis, aims to allocate a short video clip to a pre-defined category such as brushing hair or climbing stairs. Recent works focus on action recognition with deep…

Computer Vision and Pattern Recognition · Computer Science 2019-08-28 Yuqi Huo , Xiaoli Xu , Yao Lu , Yulei Niu , Zhiwu Lu , Ji-Rong Wen

Application of Transfer Learning Approaches in Multimodal Wearable Human Activity Recognition

Through this project, we researched on transfer learning methods and their applications on real world problems. By implementing and modifying various methods in transfer learning for our problem, we obtained an insight in the advantages and…

Machine Learning · Computer Science 2017-07-11 Hailin Chen , Shengping Cui , Sebastian Li

The Solution for Temporal Action Localisation Task of Perception Test Challenge 2024

This report presents our method for Temporal Action Localisation (TAL), which focuses on identifying and classifying actions within specific time intervals throughout a video sequence. We employ a data augmentation technique by expanding…

Computer Vision and Pattern Recognition · Computer Science 2024-10-15 Yinan Han , Qingyuan Jiang , Hongming Mei , Yang Yang , Jinhui Tang

Temporal-Spatial Mapping for Action Recognition

Deep learning models have enjoyed great success for image related computer vision tasks like image classification and object detection. For video related tasks like human action recognition, however, the advancements are not as significant…

Computer Vision and Pattern Recognition · Computer Science 2018-09-12 Xiaolin Song , Cuiling Lan , Wenjun Zeng , Junliang Xing , Jingyu Yang , Xiaoyan Sun

Efficient Action Detection in Untrimmed Videos via Multi-Task Learning

This paper studies the joint learning of action recognition and temporal localization in long, untrimmed videos. We employ a multi-task learning framework that performs the three highly related steps of action proposal, action recognition,…

Computer Vision and Pattern Recognition · Computer Science 2017-04-05 Yi Zhu , Shawn Newsam

Exploiting Spatial-Temporal Modelling and Multi-Modal Fusion for Human Action Recognition

In this report, our approach to tackling the task of ActivityNet 2018 Kinetics-600 challenge is described in detail. Though spatial-temporal modelling methods, which adopt either such end-to-end framework as I3D \cite{i3d} or two-stage…

Computer Vision and Pattern Recognition · Computer Science 2018-06-28 Dongliang He , Fu Li , Qijie Zhao , Xiang Long , Yi Fu , Shilei Wen

Ensemble Modeling for Multimodal Visual Action Recognition

In this work, we propose an ensemble modeling approach for multimodal action recognition. We independently train individual modality models using a variant of focal loss tailored to handle the long-tailed distribution of the MECCANO [21]…

Computer Vision and Pattern Recognition · Computer Science 2023-09-26 Jyoti Kini , Sarah Fleischer , Ishan Dave , Mubarak Shah

MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection

Action detection is an essential and challenging task, especially for densely labelled datasets of untrimmed videos. The temporal relation is complex in those datasets, including challenges like composite action, and co-occurring action.…

Computer Vision and Pattern Recognition · Computer Science 2022-03-30 Rui Dai , Srijan Das , Kumara Kahatapitiya , Michael S. Ryoo , Francois Bremond

Top-1 Solution of Multi-Moments in Time Challenge 2019

In this technical report, we briefly introduce the solutions of our team 'Efficient' for the Multi-Moments in Time challenge in ICCV 2019. We first conduct several experiments with popular Image-Based action recognition methods TRN, TSN,…

Computer Vision and Pattern Recognition · Computer Science 2020-03-16 Manyuan Zhang , Hao Shao , Guanglu Song , Yu Liu , Junjie Yan

Context-aware Proposal Network for Temporal Action Detection

This technical report presents our first place winning solution for temporal action detection task in CVPR-2022 AcitivityNet Challenge. The task aims to localize temporal boundaries of action instances with specific classes in long…

Computer Vision and Pattern Recognition · Computer Science 2022-06-22 Xiang Wang , Huaxin Zhang , Shiwei Zhang , Changxin Gao , Yuanjie Shao , Nong Sang

STSM: Spatio-Temporal Shift Module for Efficient Action Recognition

The modeling, computational cost, and accuracy of traditional Spatio-temporal networks are the three most concentrated research topics in video action recognition. The traditional 2D convolution has a low computational cost, but it cannot…

Computer Vision and Pattern Recognition · Computer Science 2021-12-07 Zhaoqilin Yang , Gaoyun An

Iterative Contrast-Classify For Semi-supervised Temporal Action Segmentation

Temporal action segmentation classifies the action of each frame in (long) video sequences. Due to the high cost of frame-wise labeling, we propose the first semi-supervised method for temporal action segmentation. Our method hinges on…

Computer Vision and Pattern Recognition · Computer Science 2021-12-09 Dipika Singhania , Rahul Rahaman , Angela Yao

Contrastive Learning for Multimodal Human Activity Recognition with Limited Labeled Data

Human activity recognition serves as the foundation for various emerging applications. In recent years, researchers have used collaborative sensing of multi-source sensors to capture complex and dynamic human activities. However, multimodal…

Machine Learning · Computer Science 2026-04-28 Long Jing , Zhixiong Yang , Yajun Zhang , Xinlong Feng

CTM: Collaborative Temporal Modeling for Action Recognition

With the rapid development of digital multimedia, video understanding has become an important field. For action recognition, temporal dimension plays an important role, and this is quite different from image recognition. In order to learn…

Computer Vision and Pattern Recognition · Computer Science 2020-02-11 Qian Liu , Tao Wang , Jie Liu , Yang Guan , Qi Bu , Longfei Yang