Related papers: Efficient data-driven encoding of scene motion usi…
A Scene, represented visually using different formats such as RGB-D, LiDAR scan, keypoints, rectangular, spherical, multi-views, etc., contains information implicitly embedded relevant to applications such as scene indexing, vision-based…
Estimation of 3D motion in a dynamic scene from a temporal pair of images is a core task in many scene understanding problems. In real world applications, a dynamic scene is commonly captured by a moving camera (i.e., panning, tilting or…
Identifying independently moving objects is an essential task for dynamic scene understanding. However, traditional cameras used in dynamic scenes may suffer from motion blur or exposure artifacts due to their sampling principle. By…
Event-based cameras are bio-inspired sensors with pixels that independently and asynchronously respond to brightness changes at microsecond resolution, offering the potential to handle visual tasks in challenging scenarios. However, due to…
We present a method that leverages the complementarity of event cameras and standard cameras to track visual features with low-latency. Event cameras are novel sensors that output pixel-level brightness changes, called "events". They offer…
Depth information is useful for many applications. Active depth sensors are appealing because they obtain dense and accurate depth maps. However, due to issues that range from power constraints to multi-sensor interference, these sensors…
Background subtraction has been a driving engine for many computer vision and video analytics tasks. Although its many variants exist, they all share the underlying assumption that photometric scene properties are either static or exhibit…
Conventional rendering techniques are primarily designed and optimized for single-frame rendering. In practical applications, such as scene editing and animation rendering, users frequently encounter scenes where only a small portion is…
This paper shows that motion vectors representing the true motion of an object in a scene can be exploited to improve the encoding process of computer generated video sequences. Therefore, a set of sequences is presented for which the true…
Event cameras are novel sensors that report brightness changes in the form of a stream of asynchronous "events" instead of intensity frames. They offer significant advantages with respect to conventional cameras: high temporal resolution,…
We introduce the concept of "dynamic image", a novel compact representation of videos useful for video analysis, particularly in combination with convolutional neural networks (CNNs). A dynamic image encodes temporal data such as RGB or…
Saliency maps are used to understand human attention and visual fixation. However, while very well established for static images, there is no general agreement on how to compute a saliency map of dynamic scenes. In this paper we propose a…
Scene graphs are a powerful structured representation of the underlying content of images, and embeddings derived from them have been shown to be useful in multiple downstream tasks. In this work, we employ a graph convolutional network to…
Motion is an important signal for agents in dynamic environments, but learning to represent motion from unlabeled video is a difficult and underconstrained problem. We propose a model of motion based on elementary group properties of…
A robust and efficient anomaly detection technique is proposed, capable of dealing with crowded scenes where traditional tracking based approaches tend to fail. Initial foreground segmentation of the input frames confines the analysis to…
Selective attention is an essential mechanism to filter sensory input and to select only its most important components, allowing the capacity-limited cognitive structures of the brain to process them in detail. The saliency map model,…
Visual object tracking under challenging conditions of motion and light can be hindered by the capabilities of conventional cameras, prone to producing images with motion blur. Event cameras are novel sensors suited to robustly perform…
In moving camera videos, motion segmentation is commonly performed using the image plane motion of pixels, or optical flow. However, objects that are at different depths from the camera can exhibit different optical flows even if they share…
Extracting informative representations from videos is fundamental for effectively learning various downstream tasks. We present a novel approach for unsupervised learning of meaningful representations from videos, leveraging the concept of…
Visual information plays an indispensable role in our daily interactions with environment. Such information is manipulated for a wide range of purposes spanning from basic object and material perception to complex gesture interpretations.…