Related papers: Spatial-Temporal Multi-level Association for Video…

Target-Aware Object Discovery and Association for Unsupervised Video Multi-Object Segmentation

This paper addresses the task of unsupervised video multi-object segmentation. Current approaches follow a two-stage paradigm: 1) detect object proposals using pre-trained Mask R-CNN, and 2) conduct generic feature matching for temporal…

Computer Vision and Pattern Recognition · Computer Science 2021-04-13 Tianfei Zhou , Jianwu Li , Xueyi Li , Ling Shao

Learning Spatial-Semantic Features for Robust Video Object Segmentation

Tracking and segmenting multiple similar objects with distinct or complex parts in long-term videos is particularly challenging due to the ambiguity in identifying target components and the confusion caused by occlusion, background clutter,…

Computer Vision and Pattern Recognition · Computer Science 2025-04-08 Xin Li , Deshui Miao , Zhenyu He , Yaowei Wang , Huchuan Lu , Ming-Hsuan Yang

Training-Free Spatio-temporal Decoupled Reasoning Video Segmentation with Adaptive Object Memory

Reasoning Video Object Segmentation (ReasonVOS) is a challenging task that requires stable object segmentation across video sequences using implicit and complex textual inputs. Previous methods fine-tune Multimodal Large Language Models…

Computer Vision and Pattern Recognition · Computer Science 2026-03-03 Zhengtong Zhu , Jiaqing Fan , Zhixuan Liu , Fanzhang Li

Efficient Spatial-Temporal Modeling for Real-Time Video Analysis: A Unified Framework for Action Recognition and Object Tracking

Real-time video analysis remains a challenging problem in computer vision, requiring efficient processing of both spatial and temporal information while maintaining computational efficiency. Existing approaches often struggle to balance…

Computer Vision and Pattern Recognition · Computer Science 2025-07-31 Shahla John

Collaborative Spatio-temporal Feature Learning for Video Action Recognition

Spatio-temporal feature learning is of central importance for action recognition in videos. Existing deep neural network models either learn spatial and temporal features independently (C2D) or jointly with unconstrained parameters (C3D).…

Computer Vision and Pattern Recognition · Computer Science 2019-03-05 Chao Li , Qiaoyong Zhong , Di Xie , Shiliang Pu

Video Object Segmentation using Space-Time Memory Networks

We propose a novel solution for semi-supervised video object segmentation. By the nature of the problem, available cues (e.g. video frame(s) with object masks) become richer with the intermediate predictions. However, the existing methods…

Computer Vision and Pattern Recognition · Computer Science 2019-08-13 Seoung Wug Oh , Joon-Young Lee , Ning Xu , Seon Joo Kim

Interaction-Aware Prompting for Zero-Shot Spatio-Temporal Action Detection

The goal of spatial-temporal action detection is to determine the time and place where each person's action occurs in a video and classify the corresponding action category. Most of the existing methods adopt fully-supervised learning,…

Computer Vision and Pattern Recognition · Computer Science 2023-09-21 Wei-Jhe Huang , Jheng-Hsien Yeh , Min-Hung Chen , Gueter Josmy Faure , Shang-Hong Lai

Self-Supervised Video Object Segmentation by Motion-Aware Mask Propagation

We propose a self-supervised spatio-temporal matching method, coined Motion-Aware Mask Propagation (MAMP), for video object segmentation. MAMP leverages the frame reconstruction task for training without the need for annotations. During…

Computer Vision and Pattern Recognition · Computer Science 2021-10-29 Bo Miao , Mohammed Bennamoun , Yongsheng Gao , Ajmal Mian

STF: Spatio-Temporal Fusion Module for Improving Video Object Detection

Consecutive frames in a video contain redundancy, but they may also contain relevant complementary information for the detection task. The objective of our work is to leverage this complementary information to improve detection. Therefore,…

Computer Vision and Pattern Recognition · Computer Science 2024-02-19 Noreen Anwar , Guillaume-Alexandre Bilodeau , Wassim Bouachir

Temporally Consistent Referring Video Object Segmentation with Hybrid Memory

Referring Video Object Segmentation (R-VOS) methods face challenges in maintaining consistent object segmentation due to temporal context variability and the presence of other visually similar objects. We propose an end-to-end R-VOS…

Computer Vision and Pattern Recognition · Computer Science 2024-10-14 Bo Miao , Mohammed Bennamoun , Yongsheng Gao , Mubarak Shah , Ajmal Mian

Space-time Reinforcement Network for Video Object Segmentation

Recently, video object segmentation (VOS) networks typically use memory-based methods: for each query frame, the mask is predicted by space-time matching to memory frames. Despite these methods having superior performance, they suffer from…

Computer Vision and Pattern Recognition · Computer Science 2024-05-08 Yadang Chen , Wentao Zhu , Zhi-Xin Yang , Enhua Wu

TTVOS: Lightweight Video Object Segmentation with Adaptive Template Attention Module and Temporal Consistency Loss

Semi-supervised video object segmentation (semi-VOS) is widely used in many applications. This task is tracking class-agnostic objects from a given target mask. For doing this, various approaches have been developed based on…

Computer Vision and Pattern Recognition · Computer Science 2021-04-06 Hyojin Park , Ganesh Venkatesh , Nojun Kwak

Dual Temporal Memory Network for Efficient Video Object Segmentation

Video Object Segmentation (VOS) is typically formulated in a semi-supervised setting. Given the ground-truth segmentation mask on the first frame, the task of VOS is to track and segment the single or multiple objects of interests in the…

Computer Vision and Pattern Recognition · Computer Science 2020-03-16 Kaihua Zhang , Long Wang , Dong Liu , Bo Liu , Qingshan Liu , Zhu Li

Learning Position and Target Consistency for Memory-based Video Object Segmentation

This paper studies the problem of semi-supervised video object segmentation(VOS). Multiple works have shown that memory-based approaches can be effective for video object segmentation. They are mostly based on pixel-level matching, both…

Computer Vision and Pattern Recognition · Computer Science 2021-04-12 Li Hu , Peng Zhang , Bang Zhang , Pan Pan , Yinghui Xu , Rong Jin

Object-Aware Multi-Branch Relation Networks for Spatio-Temporal Video Grounding

Spatio-temporal video grounding aims to retrieve the spatio-temporal tube of a queried object according to the given sentence. Currently, most existing grounding methods are restricted to well-aligned segment-sentence pairs. In this paper,…

Computer Vision and Pattern Recognition · Computer Science 2020-08-25 Zhu Zhang , Zhou Zhao , Zhijie Lin , Baoxing Huai , Nicholas Jing Yuan

Memory Matching is not Enough: Jointly Improving Memory Matching and Decoding for Video Object Segmentation

Memory-based video object segmentation methods model multiple objects over long temporal-spatial spans by establishing memory bank, which achieve the remarkable performance. However, they struggle to overcome the false matching and are…

Computer Vision and Pattern Recognition · Computer Science 2024-09-24 Jintu Zheng , Yun Liang , Yuqing Zhang , Wanchao Su

Learning Where to Focus for Efficient Video Object Detection

Transferring existing image-based detectors to the video is non-trivial since the quality of frames is always deteriorated by part occlusion, rare pose, and motion blur. Previous approaches exploit to propagate and aggregate features across…

Computer Vision and Pattern Recognition · Computer Science 2020-07-17 Zhengkai Jiang , Yu Liu , Ceyuan Yang , Jihao Liu , Peng Gao , Qian Zhang , Shiming Xiang , Chunhong Pan

Video Semantic Segmentation with Inter-Frame Feature Fusion and Inner-Frame Feature Refinement

Video semantic segmentation aims to generate accurate semantic maps for each video frame. To this end, many works dedicate to integrate diverse information from consecutive frames to enhance the features for prediction, where a feature…

Computer Vision and Pattern Recognition · Computer Science 2023-01-11 Jiafan Zhuang , Zilei Wang , Junjie Li

Self-supervised Video Object Segmentation with Distillation Learning of Deformable Attention

Video object segmentation is a fundamental research problem in computer vision. Recent techniques have often applied attention mechanism to object representation learning from video sequences. However, due to temporal changes in the video…

Computer Vision and Pattern Recognition · Computer Science 2024-03-19 Quang-Trung Truong , Duc Thanh Nguyen , Binh-Son Hua , Sai-Kit Yeung

Learning Video Object Segmentation with Visual Memory

This paper addresses the task of segmenting moving objects in unconstrained videos. We introduce a novel two-stream neural network with an explicit memory module to achieve this. The two streams of the network encode spatial and temporal…

Computer Vision and Pattern Recognition · Computer Science 2017-07-13 Pavel Tokmakov , Karteek Alahari , Cordelia Schmid