Related papers: Prediction-Tracking-Segmentation
The ability to predict, anticipate and reason about future outcomes is a key component of intelligent decision-making systems. In light of the success of deep learning in computer vision, deep-learning-based video prediction emerged as a…
We propose a light-weight variational framework for online tracking of object segmentations in videos based on optical flow and image boundaries. While high-end computer vision methods on this task rely on sequence specific training of…
We segment moving objects in videos by ranking spatio-temporal segment proposals according to "moving objectness": how likely they are to contain a moving object. In each video frame, we compute segment proposals using multiple…
We present an approach for object segmentation in videos that combines frame-level object detection with concepts from object tracking and motion segmentation. The approach extracts temporally consistent object tubes based on an…
We present an approach to semi-supervised video object segmentation, in the context of the DAVIS 2017 challenge. Our approach combines category-based object detection, category-independent object appearance segmentation and temporal object…
Visual object tracking is the problem of predicting a target object's state in a video. Generally, bounding-boxes have been used to represent states, and a surge of effort has been spent by the community to produce efficient causal…
Visual object tracking (VOT) is an essential component for many applications, such as autonomous driving or assistive robotics. However, recent works tend to develop accurate systems based on more computationally expensive feature…
In video prediction tasks, one major challenge is to capture the multi-modal nature of future contents and dynamics. In this work, we propose a simple yet effective framework that can efficiently predict plausible future states. The key…
Path prediction is a fundamental task for estimating how pedestrians or vehicles are going to move in a scene. Because path prediction as a task of computer vision uses video as input, various information used for prediction, such as the…
Video segmentation -- partitioning video frames into multiple segments or objects -- plays a critical role in a broad range of practical applications, from enhancing visual effects in movie, to understanding scenes in autonomous driving, to…
The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a.k.a. dense tracking). We make the following contributions: (i) we propose to improve the existing…
The ability to model the underlying dynamics of visual scenes and reason about the future is central to human intelligence. Many attempts have been made to empower intelligent systems with such physical understanding and prediction…
This paper presents a novel approach for segmenting moving objects in unconstrained environments using guided convolutional neural networks. This guiding process relies on foreground masks from independent algorithms (i.e. state-of-the-art…
Video anticipation is the task of predicting one/multiple future representation(s) given limited, partial observation. This is a challenging task due to the fact that given limited observation, the future representation can be highly…
Occlusion is a long-standing problem that causes many modern tracking methods to be erroneous. In this paper, we address the occlusion problem by exploiting the current and future possible locations of the target object from its past…
Online video object segmentation is a challenging task as it entails to process the image sequence timely and accurately. To segment a target object through the video, numerous CNN-based methods have been developed by heavily finetuning on…
Motion, measured via optical flow, provides a powerful cue to discover and learn objects in images and videos. However, compared to using appearance, it has some blind spots, such as the fact that objects become invisible if they do not…
We propose a novel visual tracking algorithm based on the representations from a discriminatively trained Convolutional Neural Network (CNN). Our algorithm pretrains a CNN using a large set of videos with tracking ground-truths to obtain a…
Visual tracking fundamentally involves regressing the state of the target in each frame of a video. Despite significant progress, existing regression-based trackers still tend to experience failures and inaccuracies. To enhance the…
In this paper, the main task we aim to tackle is the multi-instance semi-supervised video object segmentation across a sequence of frames where only the first-frame box-level ground-truth is provided. Detection-based algorithms are widely…