Related papers: Two Stream Self-Supervised Learning for Action Rec…

Two-Stream Convolutional Networks for Action Recognition in Videos

We investigate architectures of discriminatively trained deep Convolutional Networks (ConvNets) for action recognition in video. The challenge is to capture the complementary information on appearance from still frames and motion between…

Computer Vision and Pattern Recognition · Computer Science 2014-11-13 Karen Simonyan , Andrew Zisserman

Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition

Wearable cameras are becoming more and more popular in several applications, increasing the interest of the research community in developing approaches for recognizing actions from the first-person point of view. An open challenge in…

Computer Vision and Pattern Recognition · Computer Science 2020-12-08 Mirco Planamente , Andrea Bottino , Barbara Caputo

Two-Stream temporal transformer for video action classification

Motion representation plays an important role in video understanding and has many applications including action recognition, robot and autonomous guidance or others. Lately, transformer networks, through their self-attention mechanism…

Computer Vision and Pattern Recognition · Computer Science 2026-01-21 Nattapong Kurpukdee , Adrian G. Bors

Two-Stream Action Recognition-Oriented Video Super-Resolution

We study the video super-resolution (SR) problem for facilitating video analytics tasks, e.g. action recognition, instead of for visual quality. The popular action recognition methods based on convolutional networks, exemplified by…

Computer Vision and Pattern Recognition · Computer Science 2020-03-13 Haochen Zhang , Dong Liu , Zhiwei Xiong

ActionVLAD: Learning spatio-temporal aggregation for action classification

In this work, we introduce a new video representation for action classification that aggregates local convolutional features across the entire spatio-temporal extent of the video. We do so by integrating state-of-the-art two-stream networks…

Computer Vision and Pattern Recognition · Computer Science 2017-04-11 Rohit Girdhar , Deva Ramanan , Abhinav Gupta , Josef Sivic , Bryan Russell

Learning Appearance-motion Normality for Video Anomaly Detection

Video anomaly detection is a challenging task in the computer vision community. Most single task-based methods do not consider the independence of unique spatial and temporal patterns, while two-stream structures lack the exploration of the…

Computer Vision and Pattern Recognition · Computer Science 2022-07-28 Yang Liu , Jing Liu , Mengyang Zhao , Dingkang Yang , Xiaoguang Zhu , Liang Song

Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics

We address the problem of video representation learning without human-annotated labels. While previous efforts address the problem by designing novel self-supervised tasks using video data, the learned features are merely on a…

Computer Vision and Pattern Recognition · Computer Science 2019-04-09 Jiangliu Wang , Jianbo Jiao , Linchao Bao , Shengfeng He , Yunhui Liu , Wei Liu

Self-Supervised Learning via multi-Transformation Classification for Action Recognition

Self-supervised tasks have been utilized to build useful representations that can be used in downstream tasks when the annotation is unavailable. In this paper, we introduce a self-supervised video representation learning method based on…

Computer Vision and Pattern Recognition · Computer Science 2021-02-23 Duc Quang Vu , Ngan T. H. Le , Jia-Ching Wang

Two-stream Flow-guided Convolutional Attention Networks for Action Recognition

This paper proposes a two-stream flow-guided convolutional attention networks for action recognition in videos. The central idea is that optical flows, when properly compensated for the camera motion, can be used to guide attention to the…

Computer Vision and Pattern Recognition · Computer Science 2017-08-31 An Tran , Loong-Fah Cheong

Hidden Two-Stream Convolutional Networks for Action Recognition

Analyzing videos of human actions involves understanding the temporal relationships among video frames. State-of-the-art action recognition approaches rely on traditional optical flow estimation methods to pre-compute motion information for…

Computer Vision and Pattern Recognition · Computer Science 2018-10-31 Yi Zhu , Zhenzhong Lan , Shawn Newsam , Alexander G. Hauptmann

Video Jigsaw: Unsupervised Learning of Spatiotemporal Context for Video Action Recognition

We propose a self-supervised learning method to jointly reason about spatial and temporal context for video recognition. Recent self-supervised approaches have used spatial context [9, 34] as well as temporal coherency [32] but a…

Computer Vision and Pattern Recognition · Computer Science 2018-08-24 Unaiza Ahsan , Rishi Madhok , Irfan Essa

Exploring Temporal Information for Improved Video Understanding

In this dissertation, I present my work towards exploring temporal information for better video understanding. Specifically, I have worked on two problems: action recognition and semantic segmentation. For action recognition, I have…

Computer Vision and Pattern Recognition · Computer Science 2019-05-28 Yi Zhu

Two-stream convolutional networks for end-to-end learning of self-driving cars

We propose a methodology to extend the concept of Two-Stream Convolutional Networks to perform end-to-end learning for self-driving cars with temporal cues. The system has the ability to learn spatiotemporal features by simultaneously…

Machine Learning · Computer Science 2018-12-18 Nelson Fernandez

Fully-Coupled Two-Stream Spatiotemporal Networks for Extremely Low Resolution Action Recognition

A major emerging challenge is how to protect people's privacy as cameras and computer vision are increasingly integrated into our daily lives, including in smart devices inside homes. A potential solution is to capture and record just the…

Computer Vision and Pattern Recognition · Computer Science 2018-01-15 Mingze Xu , Aidean Sharghi , Xin Chen , David J Crandall

Two-stream Spatiotemporal Feature for Video QA Task

Understanding the content of videos is one of the core techniques for developing various helpful applications in the real world, such as recognizing various human actions for surveillance systems or customer behavior analysis in an…

Computer Vision and Pattern Recognition · Computer Science 2019-07-12 Chiwan Song , Woobin Im , Sung-eui Yoon

Two-Stream Networks for Lane-Change Prediction of Surrounding Vehicles

In highway scenarios, an alert human driver will typically anticipate early cut-in and cut-out maneuvers of surrounding vehicles using only visual cues. An automated system must anticipate these situations at an early stage too, to increase…

Computer Vision and Pattern Recognition · Computer Science 2020-08-26 David Fernández-Llorca , Mahdi Biparva , Rubén Izquierdo-Gonzalo , John K. Tsotsos

Hierarchical Modeling for Task Recognition and Action Segmentation in Weakly-Labeled Instructional Videos

This paper focuses on task recognition and action segmentation in weakly-labeled instructional videos, where only the ordered sequence of video-level actions is available during training. We propose a two-stream framework, which exploits…

Computer Vision and Pattern Recognition · Computer Science 2021-10-13 Reza Ghoddoosian , Saif Sayed , Vassilis Athitsos

Cross-Enhancement Transform Two-Stream 3D ConvNets for Action Recognition

Action recognition is an important research topic in computer vision. It is the basic work for visual understanding and has been applied in many fields. Since human actions can vary in different environments, it is difficult to infer…

Computer Vision and Pattern Recognition · Computer Science 2019-10-23 Dong Cao , Lisha Xu , Dongdong Zhang

CAST: Cross-Attention in Space and Time for Video Action Recognition

Recognizing human actions in videos requires spatial and temporal understanding. Most existing action recognition models lack a balanced spatio-temporal understanding of videos. In this work, we propose a novel two-stream architecture,…

Computer Vision and Pattern Recognition · Computer Science 2024-09-04 Dongho Lee , Jongseo Lee , Jinwoo Choi

Two-stream Collaborative Learning with Spatial-Temporal Attention for Video Classification

Video classification is highly important with wide applications, such as video search and intelligent surveillance. Video naturally consists of static and motion information, which can be represented by frame and optical flow. Recently,…

Computer Vision and Pattern Recognition · Computer Science 2017-11-10 Yuxin Peng , Yunzhen Zhao , Junchao Zhang