Related papers: Long Activity Video Understanding using Functional…
Human activity recognition is typically addressed by detecting key concepts like global and local motion, features related to object classes present in the scene, as well as features related to the global context. The next open challenges…
The interactions between human and objects are important for recognizing object-centric actions. Existing methods usually adopt a two-stage pipeline, where object proposals are first detected using a pretrained detector, and then are fed to…
Human actions often involve complex interactions across several inter-related objects in the scene. However, existing approaches to fine-grained video understanding or visual relationship detection often rely on single object representation…
The objective of this paper is a model that is able to discover, track and segment multiple moving objects in a video. We make four contributions: First, we introduce an object-centric segmentation model with a depth-ordered layer…
Recognizing human actions based on videos has became one of the most popular areas of research in computer vision in recent years. This area has many applications such as surveillance, robotics, health care, video search and human-computer…
Accurate video understanding involves reasoning about the relationships between actors, objects and their environment, often over long temporal intervals. In this paper, we propose a message passing graph neural network that explicitly…
We propose a soft attention based model for the task of action recognition in videos. We use multi-layered Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units which are deep both spatially and temporally. Our model…
Current state-of-the-art video models process a video clip as a long sequence of spatio-temporal tokens. However, they do not explicitly model objects, their interactions across the video, and instead process all the tokens in the video. In…
A growing branch of computer vision is object detection. Object detection is used in many applications such as industrial process, medical imaging analysis, and autonomous vehicles. The ability to detect objects in videos is crucial. Object…
The problem of Multiple Object Tracking (MOT) consists in following the trajectory of different objects in a sequence, usually a video. In recent years, with the rise of Deep Learning, the algorithms that provide a solution to this problem…
In this paper we propose an end-to-end trainable deep neural network model for egocentric activity recognition. Our model is built on the observation that egocentric activities are highly characterized by the objects and their locations in…
Recognizing human actions is a vital task for a humanoid robot, especially in domains like programming by demonstration. Previous approaches on action recognition primarily focused on the overall prevalent action being executed, but we…
Deep neural networks have become the primary learning technique for object recognition. Videos, unlike still images, are temporally coherent which makes the application of deep networks non-trivial. Here, we investigate how motion can aid…
In this paper we consider the problem of classifying fine-grained, multi-step activities (e.g., cooking different recipes, making disparate home improvements, creating various forms of arts and crafts) from long videos spanning up to…
This paper focuses on building object-centric representations for long-term action anticipation in videos. Our key motivation is that objects provide important cues to recognize and predict human-object interactions, especially when the…
Pixelwise annotation of image sequences can be very tedious for humans. Interactive video object segmentation aims to utilize automatic methods to speed up the process and reduce the workload of the annotators. Most contemporary approaches…
We investigate a human-like interpretable model of video understanding. Humans recognise complex activities in video by recognising critical spatio-temporal relations among explicitly recognised objects and parts, for example, an object…
Video action recognition, a critical problem in video understanding, has been gaining increasing attention. To identify actions induced by complex object-object interactions, we need to consider not only spatial relations among objects in a…
Human activity recognition is one of the most important tasks in computer vision and has proved useful in different fields such as healthcare, sports training and security. There are a number of approaches that have been explored to solve…
We present a deep learning method for the interactive video object segmentation. Our method is built upon two core operations, interaction and propagation, and each operation is conducted by Convolutional Neural Networks. The two networks…