Related papers: Scene Separation & Data Selection: Temporal Segmen…
Autonomous systems, such as self-driving cars, rely on reliable semantic environment perception for decision making. Despite great advances in video semantic segmentation, existing approaches ignore important inductive biases and lack…
We investigate video classification via a two-stream convolutional neural network (CNN) design that directly ingests information extracted from compressed video bitstreams. Our approach begins with the observation that all modern video…
State-of-the-art methods for video action recognition commonly use an ensemble of two networks: the spatial stream, which takes RGB frames as input, and the temporal stream, which takes optical flow as input. In recent work, both of these…
With ever increasing computing power and data storage capacity, the potential for large digital video libraries is growing rapidly.However, the massive use of video for the moment is limited by its opaque characteristics. Indeed, a user who…
The increasing number of surveillance cameras and security concerns have made automatic violent activity detection from surveillance footage an active area for research. Modern deep learning methods have achieved good accuracy in violence…
Robust scene segmentation and keyframe extraction are essential preprocessing steps in video understanding pipelines, supporting tasks such as indexing, summarization, and semantic retrieval. However, existing methods often lack…
This paper presents a method for automatic video object segmentation based on the fusion of motion stream, appearance stream, and instance-aware segmentation. The proposed scheme consists of a two-stream fusion network and an instance…
4D panoptic segmentation in a streaming setting is critical for highly dynamic environments, such as evacuating dense crowds and autonomous driving in complex scenarios, where real-time, fine-grained perception within a constrained time…
In this paper, we introduce Coarse-Fine Networks, a two-stream architecture which benefits from different abstractions of temporal resolution to learn better video representations for long-term motion. Traditional Video models process…
Motivated by the previous success of Two-Dimensional Convolutional Neural Network (2D CNN) on image recognition, researchers endeavor to leverage it to characterize videos. However, one limitation of applying 2D CNN to analyze videos is…
Ubiquitous sensors today emit high frequency streams of numerical measurements that reflect properties of human, animal, industrial, commercial, and natural processes. Shifts in such processes, e.g. caused by external events or internal…
We address the problem of capturing temporal information for video classification in 2D networks, without increasing their computational cost. Existing approaches focus on modifying the architecture of 2D networks (e.g. by including filters…
We present a model that automatically divides broadcast videos into coherent scenes by learning a distance measure between shots. Experiments are performed to demonstrate the effectiveness of our approach by comparing our algorithm against…
In this dissertation, I present my work towards exploring temporal information for better video understanding. Specifically, I have worked on two problems: action recognition and semantic segmentation. For action recognition, I have…
A long-term video, such as a movie or TV show, is composed of various scenes, each of which represents a series of shots sharing the same semantic story. Spotting the correct scene boundary from the long-term video is a challenging task,…
Moving object segmentation based on LiDAR is a crucial and challenging task for autonomous driving and mobile robotics. Most approaches explore spatio-temporal information from LiDAR sequences to predict moving objects in the current frame.…
In this paper, we present a detailed design of dynamic video segmentation network (DVSNet) for fast and efficient semantic video segmentation. DVSNet consists of two convolutional neural networks: a segmentation network and a flow network.…
Smoothing and filtering two-dimensional sequences are fundamental tasks in fields such as computer vision. Conventional filtering algorithms often rely on the selection of the filtering window, limiting their applicability in certain…
Semantic scene segmentation has primarily been addressed by forming representations of single images both with supervised and unsupervised methods. The problem of semantic segmentation in dynamic scenes has begun to recently receive…
This paper proposes an efficient unsupervised method for detecting relevant changes between two temporally different images of the same scene. A convolutional neural network (CNN) for semantic segmentation is implemented to extract…