Related papers: Pointly-Supervised Action Localization

Spot On: Action Localization from Pointly-Supervised Proposals

We strive for spatio-temporal localization of actions in videos. The state-of-the-art relies on action proposals at test time and selects the best one with a classifier trained on carefully annotated box annotations. Annotating action boxes…

Computer Vision and Pattern Recognition · Computer Science 2017-12-14 Pascal Mettes , Jan C. van Gemert , Cees G. M. Snoek

Localizing Actions from Video Labels and Pseudo-Annotations

The goal of this paper is to determine the spatio-temporal location of actions in video. Where training from hard to obtain box annotations is the norm, we propose an intuitive and effective algorithm that localizes actions from their class…

Computer Vision and Pattern Recognition · Computer Science 2017-12-14 Pascal Mettes , Cees G. M. Snoek , Shih-Fu Chang

Spatio-Temporal Instance Learning: Action Tubes from Class Supervision

The goal of this work is spatio-temporal action localization in videos, using only the supervision from video-level class labels. The state-of-the-art casts this weakly-supervised action localization regime as a Multiple Instance Learning…

Computer Vision and Pattern Recognition · Computer Science 2018-11-26 Pascal Mettes , Cees G. M. Snoek

Guess Where? Actor-Supervision for Spatiotemporal Action Localization

This paper addresses the problem of spatiotemporal localization of actions in videos. Compared to leading approaches, which all learn to localize based on carefully annotated boxes on training video frames, we adhere to a weakly-supervised…

Computer Vision and Pattern Recognition · Computer Science 2018-04-06 Victor Escorcia , Cuong D. Dao , Mihir Jain , Bernard Ghanem , Cees Snoek

Human Action Localization with Sparse Spatial Supervision

We introduce an approach for spatio-temporal human action localization using sparse spatial supervision. Our method leverages the large amount of annotated humans available today and extracts human tubes by combining a state-of-the-art…

Computer Vision and Pattern Recognition · Computer Science 2017-05-25 Philippe Weinzaepfel , Xavier Martin , Cordelia Schmid

What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions

Spatio-temporal grounding describes the task of localizing events in space and time, e.g., in video data, based on verbal descriptions only. Models for this task are usually trained with human-annotated sentences and bounding box…

Computer Vision and Pattern Recognition · Computer Science 2024-05-30 Brian Chen , Nina Shvetsova , Andrew Rouditchenko , Daniel Kondermann , Samuel Thomas , Shih-Fu Chang , Rogerio Feris , James Glass , Hilde Kuehne

Point-Level Temporal Action Localization: Bridging Fully-supervised Proposals to Weakly-supervised Losses

Point-Level temporal action localization (PTAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance. Existing methods adopt the frame-level prediction paradigm to learn from the sparse…

Computer Vision and Pattern Recognition · Computer Science 2020-12-16 Chen Ju , Peisen Zhao , Ya Zhang , Yanfeng Wang , Qi Tian

Action Localization through Continual Predictive Learning

The problem of action recognition involves locating the action in the video, both over time and spatially in the image. The dominant current approaches use supervised learning to solve this problem, and require large amounts of annotated…

Computer Vision and Pattern Recognition · Computer Science 2020-03-30 Sathyanarayanan N. Aakur , Sudeep Sarkar

Spatio-Temporal Action Localization in a Weakly Supervised Setting

Enabling computational systems with the ability to localize actions in video-based content has manifold applications. Traditionally, such a problem is approached in a fully-supervised setting where video-clips with complete frame-by-frame…

Computer Vision and Pattern Recognition · Computer Science 2019-05-07 Kurt Degiorgio , Fabio Cuzzolin

Tubelets: Unsupervised action proposals from spatiotemporal super-voxels

This paper considers the problem of localizing actions in videos as a sequences of bounding boxes. The objective is to generate action proposals that are likely to include the action of interest, ideally achieving high recall with few…

Computer Vision and Pattern Recognition · Computer Science 2016-07-08 Mihir Jain , Jan van Gemert , Hervé Jégou , Patrick Bouthemy , Cees G. M. Snoek

Proposal-based Temporal Action Localization with Point-level Supervision

Point-level supervised temporal action localization (PTAL) aims at recognizing and localizing actions in untrimmed videos where only a single point (frame) within every action instance is annotated in training data. Without temporal…

Computer Vision and Pattern Recognition · Computer Science 2023-10-10 Yuan Yin , Yifei Huang , Ryosuke Furuta , Yoichi Sato

Automatic Action Annotation in Weakly Labeled Videos

Manual spatio-temporal annotation of human action in videos is laborious, requires several annotators and contains human biases. In this paper, we present a weakly supervised approach to automatically obtain spatio-temporal annotations of…

Computer Vision and Pattern Recognition · Computer Science 2016-05-27 Waqas Sultani , Mubarak Shah

Zero-shot Natural Language Video Localization

Understanding videos to localize moments with natural language often requires large expensive annotated video regions paired with language queries. To eliminate the annotation costs, we make a first attempt to train a natural language video…

Computation and Language · Computer Science 2021-10-04 Jinwoo Nam , Daechul Ahn , Dongyeop Kang , Seong Jong Ha , Jonghyun Choi

Learning to track for spatio-temporal action localization

We propose an effective approach for spatio-temporal action localization in realistic videos. The approach first detects proposals at the frame-level and scores them with a combination of static and motion CNN features. It then tracks…

Computer Vision and Pattern Recognition · Computer Science 2015-09-29 Philippe Weinzaepfel , Zaid Harchaoui , Cordelia Schmid

Attentive Action and Context Factorization

We propose a method for human action recognition, one that can localize the spatiotemporal regions that `define' the actions. This is a challenging task due to the subtlety of human actions in video and the co-occurrence of contextual…

Computer Vision and Pattern Recognition · Computer Science 2019-04-12 Yang Wang , Vinh Tran , Gedas Bertasius , Lorenzo Torresani , Minh Hoai

A flexible model for training action localization with varying levels of supervision

Spatio-temporal action detection in videos is typically addressed in a fully-supervised setup with manual annotation of training videos required at every frame. Since such annotation is extremely tedious and prohibits scalability, there is…

Computer Vision and Pattern Recognition · Computer Science 2018-11-29 Guilhem Chéron , Jean-Baptiste Alayrac , Ivan Laptev , Cordelia Schmid

Enhancing Single-Frame Supervision for Better Temporal Action Localization

Temporal action localization aims to identify the boundaries and categories of actions in videos, such as scoring a goal in a football match. Single-frame supervision has emerged as a labor-efficient way to train action localizers as it…

Human-Computer Interaction · Computer Science 2023-12-11 Changjian Chen , Jiashu Chen , Weikai Yang , Haoze Wang , Johannes Knittel , Xibin Zhao , Steffen Koch , Thomas Ertl , Shixia Liu

Exploring the Temporal Consistency for Point-Level Weakly-Supervised Temporal Action Localization

Point-supervised Temporal Action Localization (PTAL) adopts a lightly frame-annotated paradigm (\textit{i.e.}, labeling only a single frame per action instance) to train a model to effectively locate action instances within untrimmed…

Computer Vision and Pattern Recognition · Computer Science 2026-02-06 Yunchuan Ma , Laiyun Qing , Guorong Li , Yuqing Liu , Yuankai Qi , Qingming Huang

Searching Action Proposals via Spatial Actionness Estimation and Temporal Path Inference and Tracking

In this paper, we address the problem of searching action proposals in unconstrained video clips. Our approach starts from actionness estimation on frame-level bounding boxes, and then aggregates the bounding boxes belonging to the same…

Computer Vision and Pattern Recognition · Computer Science 2016-08-24 Nannan Li , Dan Xu , Zhenqiang Ying , Zhihao Li , Ge Li

End-to-End Semi-Supervised Learning for Video Action Detection

In this work, we focus on semi-supervised learning for video action detection which utilizes both labeled as well as unlabeled data. We propose a simple end-to-end consistency based approach which effectively utilizes the unlabeled data.…

Computer Vision and Pattern Recognition · Computer Science 2022-07-04 Akash Kumar , Yogesh Singh Rawat