Related papers: Asynchronous Temporal Fields for Action Recognitio…

Coarse-Fine Networks for Temporal Activity Detection in Videos

In this paper, we introduce Coarse-Fine Networks, a two-stream architecture which benefits from different abstractions of temporal resolution to learn better video representations for long-term motion. Traditional Video models process…

Computer Vision and Pattern Recognition · Computer Science 2021-04-02 Kumara Kahatapitiya , Michael S. Ryoo

Deep Spatio-Temporal Random Fields for Efficient Video Segmentation

In this work we introduce a time- and memory-efficient method for structured prediction that couples neuron decisions across both space at time. We show that we are able to perform exact and efficient inference on a densely connected…

Computer Vision and Pattern Recognition · Computer Science 2018-07-10 Siddhartha Chandra , Camille Couprie , Iasonas Kokkinos

Intra- and Inter-Action Understanding via Temporal Action Parsing

Current methods for action recognition primarily rely on deep convolutional networks to derive feature embeddings of visual and motion features. While these methods have demonstrated remarkable performance on standard benchmarks, we are…

Computer Vision and Pattern Recognition · Computer Science 2020-05-21 Dian Shao , Yue Zhao , Bo Dai , Dahua Lin

Activity Graph Transformer for Temporal Action Localization

We introduce Activity Graph Transformer, an end-to-end learnable model for temporal action localization, that receives a video as input and directly predicts a set of action instances that appear in the video. Detecting and localizing…

Computer Vision and Pattern Recognition · Computer Science 2021-01-29 Megha Nawhal , Greg Mori

Temporal Relational Reasoning in Videos

Temporal relational reasoning, the ability to link meaningful transformations of objects or entities over time, is a fundamental property of intelligent species. In this paper, we introduce an effective and interpretable network module, the…

Computer Vision and Pattern Recognition · Computer Science 2018-07-26 Bolei Zhou , Alex Andonian , Aude Oliva , Antonio Torralba

End-to-End Fine-Grained Action Segmentation and Recognition Using Conditional Random Field Models and Discriminative Sparse Coding

Fine-grained action segmentation and recognition is an important yet challenging task. Given a long, untrimmed sequence of kinematic data, the task is to classify the action at each time frame and segment the time series into the correct…

Computer Vision and Pattern Recognition · Computer Science 2018-01-30 Effrosyni Mavroudi , Divya Bhaskara , Shahin Sefati , Haider Ali , René Vidal

From Recognition to Prediction: Leveraging Sequence Reasoning for Action Anticipation

The action anticipation task refers to predicting what action will happen based on observed videos, which requires the model to have a strong ability to summarize the present and then reason about the future. Experience and common sense…

Computer Vision and Pattern Recognition · Computer Science 2024-08-07 Xin Liu , Chao Hao , Zitong Yu , Huanjing Yue , Jingyu Yang

The Wisdom of Crowds: Temporal Progressive Attention for Early Action Prediction

Early action prediction deals with inferring the ongoing action from partially-observed videos, typically at the outset of the video. We propose a bottleneck-based attention model that captures the evolution of the action, through…

Computer Vision and Pattern Recognition · Computer Science 2023-04-04 Alexandros Stergiou , Dima Damen

Hierarchical Attention Network for Action Segmentation

The temporal segmentation of events is an essential task and a precursor for the automatic recognition of human actions in the video. Several attempts have been made to capture frame-level salient aspects through attention but they lack the…

Computer Vision and Pattern Recognition · Computer Science 2020-05-08 Harshala Gammulle , Simon Denman , Sridha Sridharan , Clinton Fookes

End-to-end Temporal Action Detection with Transformer

Temporal action detection (TAD) aims to determine the semantic label and the temporal interval of every action instance in an untrimmed video. It is a fundamental and challenging task in video understanding. Previous methods tackle this…

Computer Vision and Pattern Recognition · Computer Science 2022-08-12 Xiaolong Liu , Qimeng Wang , Yao Hu , Xu Tang , Shiwei Zhang , Song Bai , Xiang Bai

Cooking in the kitchen: Recognizing and Segmenting Human Activities in Videos

As research on action recognition matures, the focus is shifting away from categorizing basic task-oriented actions using hand-segmented video datasets to understanding complex goal-oriented daily human activities in real-world settings.…

Computer Vision and Pattern Recognition · Computer Science 2016-03-18 Hilde Kuehne , Juergen Gall , Thomas Serre

Temporal Query Networks for Fine-grained Video Understanding

Our objective in this work is fine-grained classification of actions in untrimmed videos, where the actions may be temporally extended or may span only a few frames of the video. We cast this into a query-response mechanism, where each…

Computer Vision and Pattern Recognition · Computer Science 2021-04-20 Chuhan Zhang , Ankush Gupta , Andrew Zisserman

Learning Latent Sub-events in Activity Videos Using Temporal Attention Filters

In this paper, we newly introduce the concept of temporal attention filters, and describe how they can be used for human activity recognition from videos. Many high-level activities are often composed of multiple temporal parts (e.g.,…

Computer Vision and Pattern Recognition · Computer Science 2016-12-28 AJ Piergiovanni , Chenyou Fan , Michael S. Ryoo

Temporal Aggregate Representations for Long-Range Video Understanding

Future prediction, especially in long-range videos, requires reasoning from current and past observations. In this work, we address questions of temporal extent, scaling, and level of semantic abstraction with a flexible multi-granular…

Computer Vision and Pattern Recognition · Computer Science 2020-08-03 Fadime Sener , Dipika Singhania , Angela Yao

Temporal Convolutional Networks for Action Segmentation and Detection

The ability to identify and temporally segment fine-grained human actions throughout a video is crucial for robotics, surveillance, education, and beyond. Typical approaches decouple this problem by first extracting local spatiotemporal…

Computer Vision and Pattern Recognition · Computer Science 2016-11-17 Colin Lea , Michael D. Flynn , Rene Vidal , Austin Reiter , Gregory D. Hager

Weakly Supervised Action Learning with RNN based Fine-to-coarse Modeling

We present an approach for weakly supervised learning of human actions. Given a set of videos and an ordered list of the occurring actions, the goal is to infer start and end frames of the related action classes within the video and to…

Computer Vision and Pattern Recognition · Computer Science 2017-10-10 Alexander Richard , Hilde Kuehne , Juergen Gall

Spatio-Temporal Action Detection with Cascade Proposal and Location Anticipation

In this work, we address the problem of spatio-temporal action detection in temporally untrimmed videos. It is an important and challenging task as finding accurate human actions in both temporal and spatial space is important for analyzing…

Computer Vision and Pattern Recognition · Computer Science 2017-08-02 Zhenheng Yang , Jiyang Gao , Ram Nevatia

STEP: Spatio-Temporal Progressive Learning for Video Action Detection

In this paper, we propose Spatio-TEmporal Progressive (STEP) action detector---a progressive learning framework for spatio-temporal action detection in videos. Starting from a handful of coarse-scale proposal cuboids, our approach…

Computer Vision and Pattern Recognition · Computer Science 2019-04-22 Xitong Yang , Xiaodong Yang , Ming-Yu Liu , Fanyi Xiao , Larry Davis , Jan Kautz

Context-aware Proposal Network for Temporal Action Detection

This technical report presents our first place winning solution for temporal action detection task in CVPR-2022 AcitivityNet Challenge. The task aims to localize temporal boundaries of action instances with specific classes in long…

Computer Vision and Pattern Recognition · Computer Science 2022-06-22 Xiang Wang , Huaxin Zhang , Shiwei Zhang , Changxin Gao , Yuanjie Shao , Nong Sang

Timeception for Complex Action Recognition

This paper focuses on the temporal aspect for recognizing human activities in videos; an important visual cue that has long been undervalued. We revisit the conventional definition of activity and restrict it to Complex Action: a set of…

Computer Vision and Pattern Recognition · Computer Science 2019-04-30 Noureldien Hussein , Efstratios Gavves , Arnold W. M. Smeulders