Related papers: Action Localization through Continual Predictive L…

Actor-centered Representations for Action Localization in Streaming Videos

Event perception tasks such as recognizing and localizing actions in streaming videos are essential for scaling to real-world application contexts. We tackle the problem of learning actor-centered representations through the notion of…

Computer Vision and Pattern Recognition · Computer Science 2022-12-01 Sathyanarayanan N. Aakur , Sudeep Sarkar

Self-Supervision by Prediction for Object Discovery in Videos

Despite their irresistible success, deep learning algorithms still heavily rely on annotated data. On the other hand, unsupervised settings pose many challenges, especially about determining the right inductive bias in diverse scenarios.…

Computer Vision and Pattern Recognition · Computer Science 2021-03-11 Beril Besbinar , Pascal Frossard

Pointly-Supervised Action Localization

This paper strives for spatio-temporal localization of human actions in videos. In the literature, the consensus is to achieve localization by training on bounding box annotations provided for each frame of each training video. As…

Computer Vision and Pattern Recognition · Computer Science 2018-10-02 Pascal Mettes , Cees G. M. Snoek

Learning Transferable Self-attentive Representations for Action Recognition in Untrimmed Videos with Weak Supervision

Action recognition in videos has attracted a lot of attention in the past decade. In order to learn robust models, previous methods usually assume videos are trimmed as short sequences and require ground-truth annotations of each video…

Computer Vision and Pattern Recognition · Computer Science 2019-02-21 Xiao-Yu Zhang , Haichao Shi , Changsheng Li , Kai Zheng , Xiaobin Zhu , Lixin Duan

Towards Active Vision for Action Localization with Reactive Control and Predictive Learning

Visual event perception tasks such as action localization have primarily focused on supervised learning settings under a static observer, i.e., the camera is static and cannot be controlled by an algorithm. They are often restricted by the…

Computer Vision and Pattern Recognition · Computer Science 2021-11-11 Shubham Trehan , Sathyanarayanan N. Aakur

Unsupervised Keypoint Learning for Guiding Class-Conditional Video Prediction

We propose a deep video prediction model conditioned on a single image and an action class. To generate future frames, we first detect keypoints of a moving object and predict future motion as a sequence of keypoints. The input image is…

Computer Vision and Pattern Recognition · Computer Science 2019-10-07 Yunji Kim , Seonghyeon Nam , In Cho , Seon Joo Kim

Unsupervised learning of action classes with continuous temporal embedding

The task of temporally detecting and segmenting actions in untrimmed videos has seen an increased attention recently. One problem in this context arises from the need to define and label action boundaries to create annotations for training…

Computer Vision and Pattern Recognition · Computer Science 2019-04-09 Anna Kukleva , Hilde Kuehne , Fadime Sener , Juergen Gall

End-to-End Semi-Supervised Learning for Video Action Detection

In this work, we focus on semi-supervised learning for video action detection which utilizes both labeled as well as unlabeled data. We propose a simple end-to-end consistency based approach which effectively utilizes the unlabeled data.…

Computer Vision and Pattern Recognition · Computer Science 2022-07-04 Akash Kumar , Yogesh Singh Rawat

Localizing Actions from Video Labels and Pseudo-Annotations

The goal of this paper is to determine the spatio-temporal location of actions in video. Where training from hard to obtain box annotations is the norm, we propose an intuitive and effective algorithm that localizes actions from their class…

Computer Vision and Pattern Recognition · Computer Science 2017-12-14 Pascal Mettes , Cees G. M. Snoek , Shih-Fu Chang

Learning to track for spatio-temporal action localization

We propose an effective approach for spatio-temporal action localization in realistic videos. The approach first detects proposals at the frame-level and scores them with a combination of static and motion CNN features. It then tracks…

Computer Vision and Pattern Recognition · Computer Science 2015-09-29 Philippe Weinzaepfel , Zaid Harchaoui , Cordelia Schmid

Enhancing Single-Frame Supervision for Better Temporal Action Localization

Temporal action localization aims to identify the boundaries and categories of actions in videos, such as scoring a goal in a football match. Single-frame supervision has emerged as a labor-efficient way to train action localizers as it…

Human-Computer Interaction · Computer Science 2023-12-11 Changjian Chen , Jiashu Chen , Weikai Yang , Haoze Wang , Johannes Knittel , Xibin Zhao , Steffen Koch , Thomas Ertl , Shixia Liu

Encouraging LSTMs to Anticipate Actions Very Early

In contrast to the widely studied problem of recognizing an action given a complete sequence, action anticipation aims to identify the action from only partially available videos. As such, it is therefore key to the success of computer…

Computer Vision and Pattern Recognition · Computer Science 2017-08-16 Mohammad Sadegh Aliakbarian , Fatemeh Sadat Saleh , Mathieu Salzmann , Basura Fernando , Lars Petersson , Lars Andersson

Searching Action Proposals via Spatial Actionness Estimation and Temporal Path Inference and Tracking

In this paper, we address the problem of searching action proposals in unconstrained video clips. Our approach starts from actionness estimation on frame-level bounding boxes, and then aggregates the bounding boxes belonging to the same…

Computer Vision and Pattern Recognition · Computer Science 2016-08-24 Nannan Li , Dan Xu , Zhenqiang Ying , Zhihao Li , Ge Li

Weakly-Supervised Action Localization by Hierarchically-structured Latent Attention Modeling

Weakly-supervised action localization aims to recognize and localize action instancese in untrimmed videos with only video-level labels. Most existing models rely on multiple instance learning(MIL), where the predictions of unlabeled…

Computer Vision and Pattern Recognition · Computer Science 2023-09-27 Guiqin Wang , Peng Zhao , Cong Zhao , Shusen Yang , Jie Cheng , Luziwei Leng , Jianxing Liao , Qinghai Guo

Online Localization and Prediction of Actions and Interactions

This paper proposes a person-centric and online approach to the challenging problem of localization and prediction of actions and interactions in videos. Typically, localization or recognition is performed in an offline manner where all the…

Computer Vision and Pattern Recognition · Computer Science 2016-12-06 Khurram Soomro , Haroon Idrees , Mubarak Shah

Video Representation Learning by Recognizing Temporal Transformations

We introduce a novel self-supervised learning approach to learn representations of videos that are responsive to changes in the motion dynamics. Our representations can be learned from data without human annotation and provide a substantial…

Computer Vision and Pattern Recognition · Computer Science 2020-07-22 Simon Jenni , Givi Meishvili , Paolo Favaro

Consistency-based Self-supervised Learning for Temporal Anomaly Localization

This work tackles Weakly Supervised Anomaly detection, in which a predictor is allowed to learn not only from normal examples but also from a few labeled anomalies made available during training. In particular, we deal with the localization…

Computer Vision and Pattern Recognition · Computer Science 2022-08-11 Aniello Panariello , Angelo Porrello , Simone Calderara , Rita Cucchiara

Adversarial Framework for Unsupervised Learning of Motion Dynamics in Videos

Human behavior understanding in videos is a complex, still unsolved problem and requires to accurately model motion at both the local (pixel-wise dense prediction) and global (aggregation of motion cues) levels. Current approaches based on…

Computer Vision and Pattern Recognition · Computer Science 2019-09-19 C. Spampinato , S. Palazzo , P. D'Oro , D. Giordano , M. Shah

Weakly Supervised Action Selection Learning in Video

Localizing actions in video is a core task in computer vision. The weakly supervised temporal localization problem investigates whether this task can be adequately solved with only video-level labels, significantly reducing the amount of…

Computer Vision and Pattern Recognition · Computer Science 2021-05-07 Junwei Ma , Satya Krishna Gorti , Maksims Volkovs , Guangwei Yu

A flexible model for training action localization with varying levels of supervision

Spatio-temporal action detection in videos is typically addressed in a fully-supervised setup with manual annotation of training videos required at every frame. Since such annotation is extremely tedious and prohibits scalability, there is…

Computer Vision and Pattern Recognition · Computer Science 2018-11-29 Guilhem Chéron , Jean-Baptiste Alayrac , Ivan Laptev , Cordelia Schmid