Related papers: Actor-Centric Relation Network
Localizing persons and recognizing their actions from videos is a challenging task towards high-level video understanding. Recent advances have been achieved by modeling direct pairwise relations between entities. In this paper, we take one…
Action detection is a challenging video understanding task, requiring modeling spatio-temporal and interaction relations. Current methods usually model actor-actor and actor-context relations separately, ignoring their complementarity and…
This paper focuses on multi-person action forecasting in videos. More precisely, given a history of H previous frames, the goal is to detect actors and to predict their future actions for the next T frames. Our approach jointly models…
Modeling relation between actors is important for recognizing group activity in a multi-person scene. This paper aims at learning discriminative relation between actors efficiently using deep models. To this end, we propose to build a…
In this paper, we propose an approach that spatially localizes the activities in a video frame where each person can perform multiple activities at the same time. Our approach takes the temporal scene context as well as the relations of the…
Action detection is an essential and challenging task, especially for densely labelled datasets of untrimmed videos. There are many real-world challenges in those datasets, such as composite action, co-occurring action, and high temporal…
Action detection plays an important role in high-level video understanding and media interpretation. Many existing studies fulfill this spatio-temporal localization by modeling the context, capturing the relationship of actors, objects, and…
Despite the notable progress made in action recognition tasks, not much work has been done in action recognition specifically for human-robot interaction. In this paper, we deeply explore the characteristics of the action recognition task…
Recently, pose-based action recognition has gained more and more attention due to the better performance compared with traditional appearance-based methods. However, there still exist two problems to be further solved. First, existing…
Research in action detection has grown in the recentyears, as it plays a key role in video understanding. Modelling the interactions (either spatial or temporal) between actors and their context has proven to be essential for this task.…
Action Detection is a complex task that aims to detect and classify human actions in video clips. Typically, it has been addressed by processing fine-grained features extracted from a video classification backbone. Recently, thanks to the…
The dominant paradigm for video-based action segmentation is composed of two steps: first, for each frame, compute low-level features using Dense Trajectories or a Convolutional Neural Network that encode spatiotemporal information locally,…
Most existing Convolutional Neural Networks(CNNs) used for action recognition are either difficult to optimize or underuse crucial temporal information. Inspired by the fact that the recurrent model consistently makes breakthroughs in the…
Event perception tasks such as recognizing and localizing actions in streaming videos are essential for scaling to real-world application contexts. We tackle the problem of learning actor-centered representations through the notion of…
While observing complex events with multiple actors, humans do not assess each actor separately, but infer from the context. The surrounding context provides essential information for understanding actions. To this end, we propose to…
We introduce the Action Transformer model for recognizing and localizing human actions in video clips. We repurpose a Transformer-style architecture to aggregate features from the spatiotemporal context around the person whose actions we…
Action recognition is an important problem that requires identifying actions in video by learning complex interactions across scene actors and objects. However, modern deep-learning based networks often require significant computation, and…
Recent graph convolutional neural networks (GCNs) have shown high performance in the field of human action recognition by using human skeleton poses. However, it fails to detect human-object interaction cases successfully due to the lack of…
This thesis focuses on video understanding for human action and interaction recognition. We start by identifying the main challenges related to action recognition from videos and review how they have been addressed by current methods. Based…
We propose a method for human action recognition, one that can localize the spatiotemporal regions that `define' the actions. This is a challenging task due to the subtlety of human actions in video and the co-occurrence of contextual…