Related papers: Action Representation Using Classifier Decision Bo…

Video Representation Learning Using Discriminative Pooling

Popular deep models for action recognition in videos generate independent predictions for short clips, which are then pooled heuristically to assign an action label to the full video segment. As not all frames may characterize the…

Computer Vision and Pattern Recognition · Computer Science 2018-04-02 Jue Wang , Anoop Cherian , Fatih Porikli , Stephen Gould

Discriminative Video Representation Learning Using Support Vector Classifiers

Most popular deep models for action recognition in videos generate independent predictions for short clips, which are then pooled heuristically to assign an action label to the full video segment. As not all frames may characterize the…

Computer Vision and Pattern Recognition · Computer Science 2019-09-09 Jue Wang , Anoop Cherian

Discriminatively Learned Hierarchical Rank Pooling Networks

In this work, we present novel temporal encoding methods for action and activity classification by extending the unsupervised rank pooling temporal encoding method in two ways. First, we present "discriminative rank pooling" in which the…

Computer Vision and Pattern Recognition · Computer Science 2017-05-31 Basura Fernando , Stephen Gould

Action Recognition with Dynamic Image Networks

We introduce the concept of "dynamic image", a novel compact representation of videos useful for video analysis, particularly in combination with convolutional neural networks (CNNs). A dynamic image encodes temporal data such as RGB or…

Computer Vision and Pattern Recognition · Computer Science 2017-08-22 Hakan Bilen , Basura Fernando , Efstratios Gavves , Andrea Vedaldi

AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos

We propose a novel method for temporally pooling frames in a video for the task of human action recognition. The method is motivated by the observation that there are only a small number of frames which, together, contain sufficient…

Computer Vision and Pattern Recognition · Computer Science 2017-06-27 Amlan Kar , Nishant Rai , Karan Sikka , Gaurav Sharma

Order-aware Convolutional Pooling for Video Based Action Recognition

Most video based action recognition approaches create the video-level representation by temporally pooling the features extracted at each frame. The pooling methods that they adopt, however, usually completely or partially neglect the…

Computer Vision and Pattern Recognition · Computer Science 2016-02-02 Peng Wang , Lingqiao Liu , Chunhua Shen , Heng Tao Shen

Rank Pooling for Action Recognition

We propose a function-based temporal pooling method that captures the latent structure of the video sequence data - e.g. how frame-level features evolve over time in a video. We show how the parameters of a function that has been fit to the…

Computer Vision and Pattern Recognition · Computer Science 2016-05-17 Basura Fernando , Efstratios Gavves , Jose Oramas , Amir Ghodrati , Tinne Tuytelaars

Second-order Temporal Pooling for Action Recognition

Deep learning models for video-based action recognition usually generate features for short clips (consisting of a few frames); such clip-level features are aggregated to video-level representations by computing statistics on these…

Computer Vision and Pattern Recognition · Computer Science 2018-08-08 Anoop Cherian , Stephen Gould

Temporal Pyramid Pooling Based Convolutional Neural Networks for Action Recognition

Encouraged by the success of Convolutional Neural Networks (CNNs) in image classification, recently much effort is spent on applying CNNs to video based action recognition problems. One challenge is that video contains a varying number of…

Computer Vision and Pattern Recognition · Computer Science 2015-04-17 Peng Wang , Yuanzhouhan Cao , Chunhua Shen , Lingqiao Liu , Heng Tao Shen

Non-Linear Temporal Subspace Representations for Activity Recognition

Representations that can compactly and effectively capture the temporal evolution of semantic content are important to computer vision and machine learning algorithms that operate on multi-variate time-series data. We investigate such…

Computer Vision and Pattern Recognition · Computer Science 2018-03-30 Anoop Cherian , Suvrit Sra , Stephen Gould , Richard Hartley

Pooling the Convolutional Layers in Deep ConvNets for Action Recognition

Deep ConvNets have shown its good performance in image classification tasks. However it still remains as a problem in deep video representation for action recognition. The problem comes from two aspects: on one hand, current video ConvNets…

Computer Vision and Pattern Recognition · Computer Science 2015-11-09 Shichao Zhao , Yanbin Liu , Yahong Han , Richang Hong

Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors

Visual features are of vital importance for human action understanding in videos. This paper presents a new video representation, called trajectory-pooled deep-convolutional descriptor (TDD), which shares the merits of both hand-crafted…

Computer Vision and Pattern Recognition · Computer Science 2016-11-17 Limin Wang , Yu Qiao , Xiaoou Tang

Generalized Rank Pooling for Activity Recognition

Most popular deep models for action recognition split video sequences into short sub-sequences consisting of a few frames; frame-based features are then pooled for recognizing the activity. Usually, this pooling step discards the temporal…

Computer Vision and Pattern Recognition · Computer Science 2017-07-25 Anoop Cherian , Basura Fernando , Mehrtash Harandi , Stephen Gould

Higher-order Pooling of CNN Features via Kernel Linearization for Action Recognition

Most successful deep learning algorithms for action recognition extend models designed for image-based tasks such as object recognition to video. Such extensions are typically trained for actions on single video frames or very short clips,…

Computer Vision and Pattern Recognition · Computer Science 2017-01-20 Anoop Cherian , Piotr Koniusz , Stephen Gould

Action Recognition with Deep Multiple Aggregation Networks

Most of the current action recognition algorithms are based on deep networks which stack multiple convolutional, pooling and fully connected layers. While convolutional and fully connected operations have been widely studied in the…

Computer Vision and Pattern Recognition · Computer Science 2020-06-09 Ahmed Mazari , Hichem Sahbi

Human Action Recognition with Deep Temporal Pyramids

Deep convolutional neural networks (CNNs) are nowadays achieving significant leaps in different pattern recognition tasks including action recognition. Current CNNs are increasingly deeper, data-hungrier and this makes their success…

Computer Vision and Pattern Recognition · Computer Science 2019-05-03 Ahmed Mazari , Hichem Sahbi

Deep Local Video Feature for Action Recognition

We investigate the problem of representing an entire video using CNN features for human action recognition. Currently, limited by GPU memory, we have not been able to feed a whole video into CNN/RNNs for end-to-end learning. A common…

Computer Vision and Pattern Recognition · Computer Science 2017-01-31 Zhenzhong Lan , Yi Zhu , Alexander G. Hauptmann

A Discriminative CNN Video Representation for Event Detection

In this paper, we propose a discriminative video representation for event detection over a large scale video dataset when only limited hardware resources are available. The focus of this paper is to effectively leverage deep Convolutional…

Computer Vision and Pattern Recognition · Computer Science 2014-11-17 Zhongwen Xu , Yi Yang , Alexander G. Hauptmann

Action Recognition with Image Based CNN Features

Most of human actions consist of complex temporal compositions of more simple actions. Action recognition tasks usually relies on complex handcrafted structures as features to represent the human action model. Convolutional Neural Nets…

Computer Vision and Pattern Recognition · Computer Science 2015-12-15 Mahdyar Ravanbakhsh , Hossein Mousavi , Mohammad Rastegari , Vittorio Murino , Larry S. Davis

No frame left behind: Full Video Action Recognition

Not all video frames are equally informative for recognizing an action. It is computationally infeasible to train deep networks on all video frames when actions develop over hundreds of frames. A common heuristic is uniformly sampling a…

Computer Vision and Pattern Recognition · Computer Science 2021-03-30 Xin Liu , Silvia L. Pintea , Fatemeh Karimi Nejadasl , Olaf Booij , Jan C. van Gemert