Related papers: Video Object Segmentation with Dynamic Query Modul…

Region Aware Video Object Segmentation with Deep Motion Modeling

Current semi-supervised video object segmentation (VOS) methods usually leverage the entire features of one frame to predict object masks and update memory. This introduces significant redundant computations. To reduce redundancy, we…

Computer Vision and Pattern Recognition · Computer Science 2022-07-22 Bo Miao , Mohammed Bennamoun , Yongsheng Gao , Ajmal Mian

PReMVOS: Proposal-generation, Refinement and Merging for Video Object Segmentation

We address semi-supervised video object segmentation, the task of automatically generating accurate and consistent pixel masks for objects in a video sequence, given the first-frame ground truth annotations. Towards this goal, we present…

Computer Vision and Pattern Recognition · Computer Science 2018-11-06 Jonathon Luiten , Paul Voigtlaender , Bastian Leibe

Video Object Segmentation using Space-Time Memory Networks

We propose a novel solution for semi-supervised video object segmentation. By the nature of the problem, available cues (e.g. video frame(s) with object masks) become richer with the intermediate predictions. However, the existing methods…

Computer Vision and Pattern Recognition · Computer Science 2019-08-13 Seoung Wug Oh , Joon-Young Lee , Ning Xu , Seon Joo Kim

Efficient Video Object Segmentation via Modulated Cross-Attention Memory

Recently, transformer-based approaches have shown promising results for semi-supervised video object segmentation. However, these approaches typically struggle on long videos due to increased GPU memory demands, as they frequently expand…

Computer Vision and Pattern Recognition · Computer Science 2024-09-27 Abdelrahman Shaker , Syed Talal Wasim , Martin Danelljan , Salman Khan , Ming-Hsuan Yang , Fahad Shahbaz Khan

CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing

In this work we propose a capsule-based approach for semi-supervised video object segmentation. Current video object segmentation methods are frame-based and often require optical flow to capture temporal consistency across frames which can…

Computer Vision and Pattern Recognition · Computer Science 2019-10-02 Kevin Duarte , Yogesh S Rawat , Mubarak Shah

Learning Spatial-Semantic Features for Robust Video Object Segmentation

Tracking and segmenting multiple similar objects with distinct or complex parts in long-term videos is particularly challenging due to the ambiguity in identifying target components and the confusion caused by occlusion, background clutter,…

Computer Vision and Pattern Recognition · Computer Science 2025-04-08 Xin Li , Deshui Miao , Zhenyu He , Yaowei Wang , Huchuan Lu , Ming-Hsuan Yang

PMVOS: Pixel-Level Matching-Based Video Object Segmentation

Semi-supervised video object segmentation (VOS) aims to segment arbitrary target objects in video when the ground truth segmentation mask of the initial frame is provided. Due to this limitation of using prior knowledge about the target…

Computer Vision and Pattern Recognition · Computer Science 2020-09-21 Suhwan Cho , Heansung Lee , Sungmin Woo , Sungjun Jang , Sangyoun Lee

LSMVOS: Long-Short-Term Similarity Matching for Video Object

Objective Semi-supervised video object segmentation refers to segmenting the object in subsequent frames given the object label in the first frame. Existing algorithms are mostly based on the objectives of matching and propagation…

Computer Vision and Pattern Recognition · Computer Science 2020-09-03 Zhang Xuerui , Yuan Xia

Dual Temporal Memory Network for Efficient Video Object Segmentation

Video Object Segmentation (VOS) is typically formulated in a semi-supervised setting. Given the ground-truth segmentation mask on the first frame, the task of VOS is to track and segment the single or multiple objects of interests in the…

Computer Vision and Pattern Recognition · Computer Science 2020-03-16 Kaihua Zhang , Long Wang , Dong Liu , Bo Liu , Qingshan Liu , Zhu Li

TTVOS: Lightweight Video Object Segmentation with Adaptive Template Attention Module and Temporal Consistency Loss

Semi-supervised video object segmentation (semi-VOS) is widely used in many applications. This task is tracking class-agnostic objects from a given target mask. For doing this, various approaches have been developed based on…

Computer Vision and Pattern Recognition · Computer Science 2021-04-06 Hyojin Park , Ganesh Venkatesh , Nojun Kwak

FVOS for MOSE Track of 4th PVUW Challenge: 3rd Place Solution

Video Object Segmentation (VOS) is one of the most fundamental and challenging tasks in computer vision and has a wide range of applications. Most existing methods rely on spatiotemporal memory networks to extract frame-level features and…

Computer Vision and Pattern Recognition · Computer Science 2025-04-15 Mengjiao Wang , Junpei Zhang , Xu Liu , Yuting Yang , Mengru Ma

FlowVOS: Weakly-Supervised Visual Warping for Detail-Preserving and Temporally Consistent Single-Shot Video Object Segmentation

We consider the task of semi-supervised video object segmentation (VOS). Our approach mitigates shortcomings in previous VOS work by addressing detail preservation and temporal consistency using visual warping. In contrast to prior work…

Computer Vision and Pattern Recognition · Computer Science 2021-11-23 Julia Gong , F. Christopher Holsinger , Serena Yeung

Space-time Reinforcement Network for Video Object Segmentation

Recently, video object segmentation (VOS) networks typically use memory-based methods: for each query frame, the mask is predicted by space-time matching to memory frames. Despite these methods having superior performance, they suffer from…

Computer Vision and Pattern Recognition · Computer Science 2024-05-08 Yadang Chen , Wentao Zhu , Zhi-Xin Yang , Enhua Wu

Referring Video Object Segmentation with Cross-Modality Proxy Queries

Referring video object segmentation (RVOS) is an emerging cross-modality task that aims to generate pixel-level maps of the target objects referred by given textual expressions. The main concept involves learning an accurate alignment of…

Computer Vision and Pattern Recognition · Computer Science 2025-11-27 Baoli Sun , Xinzhu Ma , Ning Wang , Zhihui Wang , Zhiyong Wang

2nd Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation

Motion Expression guided Video Segmentation is a challenging task that aims at segmenting objects in the video based on natural language expressions with motion descriptions. Unlike the previous referring video object segmentation (RVOS),…

Computer Vision and Pattern Recognition · Computer Science 2024-06-21 Bin Cao , Yisi Zhang , Xuanxu Lin , Xingjian He , Bo Zhao , Jing Liu

Structure Matters: Revisiting Boundary Refinement in Video Object Segmentation

Given an object mask, Semi-supervised Video Object Segmentation (SVOS) technique aims to track and segment the object across video frames, serving as a fundamental task in computer vision. Although recent memory-based methods demonstrate…

Computer Vision and Pattern Recognition · Computer Science 2025-07-28 Guanyi Qin , Ziyue Wang , Daiyun Shen , Haofeng Liu , Hantao Zhou , Junde Wu , Runze Hu , Yueming Jin

Training-Free Spatio-temporal Decoupled Reasoning Video Segmentation with Adaptive Object Memory

Reasoning Video Object Segmentation (ReasonVOS) is a challenging task that requires stable object segmentation across video sequences using implicit and complex textual inputs. Previous methods fine-tune Multimodal Large Language Models…

Computer Vision and Pattern Recognition · Computer Science 2026-03-03 Zhengtong Zhu , Jiaqing Fan , Zhixuan Liu , Fanzhang Li

Long-RVOS: A Comprehensive Benchmark for Long-term Referring Video Object Segmentation

Referring video object segmentation (RVOS) aims to identify, track and segment the objects in a video based on language descriptions, which has received great attention in recent years. However, existing datasets remain focus on short video…

Computer Vision and Pattern Recognition · Computer Science 2025-10-29 Tianming Liang , Haichao Jiang , Yuting Yang , Chaolei Tan , Shuai Li , Wei-Shi Zheng , Jian-Fang Hu

Learning Quality-aware Dynamic Memory for Video Object Segmentation

Recently, several spatial-temporal memory-based methods have verified that storing intermediate frames and their masks as memory are helpful to segment target objects in videos. However, they mainly focus on better matching between the…

Computer Vision and Pattern Recognition · Computer Science 2022-07-19 Yong Liu , Ran Yu , Fei Yin , Xinyuan Zhao , Wei Zhao , Weihao Xia , Yujiu Yang

Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

We present Modular interactive VOS (MiVOS) framework which decouples interaction-to-mask and mask propagation, allowing for higher generalizability and better performance. Trained separately, the interaction module converts user…

Computer Vision and Pattern Recognition · Computer Science 2021-03-23 Ho Kei Cheng , Yu-Wing Tai , Chi-Keung Tang