Related papers: UVIS: Unsupervised Video Instance Segmentation

MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training

We propose MinVIS, a minimal video instance segmentation (VIS) framework that achieves state-of-the-art VIS performance with neither video-based architectures nor training procedures. By only training a query-based image instance…

Computer Vision and Pattern Recognition · Computer Science 2022-08-04 De-An Huang , Zhiding Yu , Anima Anandkumar

Boosting Unsupervised Video Instance Segmentation with Automatic Quality-Guided Self-Training

Video Instance Segmentation (VIS) faces significant annotation challenges due to its dual requirements of pixel-level masks and temporal consistency labels. While recent unsupervised methods like VideoCutLER eliminate optical flow…

Computer Vision and Pattern Recognition · Computer Science 2025-12-09 Kaixuan Lu , Mehmet Onurcan Kaya , Dim P. Papadopoulos

VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation

Existing approaches to unsupervised video instance segmentation typically rely on motion estimates and experience difficulties tracking small or divergent motions. We present VideoCutLER, a simple method for unsupervised multi-instance…

Computer Vision and Pattern Recognition · Computer Science 2023-08-29 Xudong Wang , Ishan Misra , Ziyun Zeng , Rohit Girdhar , Trevor Darrell

FlowCut: Unsupervised Video Instance Segmentation via Temporal Mask Matching

We propose FlowCut, a simple and capable method for unsupervised video instance segmentation consisting of a three-stage framework to construct a high-quality video dataset with pseudo labels. To our knowledge, our work is the first attempt…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Alp Eren Sari , Paolo Favaro

AutoQ-VIS: Improving Unsupervised Video Instance Segmentation via Automatic Quality Assessment

Video Instance Segmentation (VIS) faces significant annotation challenges due to its dual requirements of pixel-level masks and temporal consistency labels. While recent unsupervised methods like VideoCutLER eliminate optical flow…

Computer Vision and Pattern Recognition · Computer Science 2025-08-28 Kaixuan Lu , Mehmet Onurcan Kaya , Dim P. Papadopoulos

Learning to Track Instances without Video Annotations

Tracking segmentation masks of multiple instances has been intensively studied, but still faces two fundamental challenges: 1) the requirement of large-scale, frame-wise annotation, and 2) the complexity of two-stage approaches. To resolve…

Computer Vision and Pattern Recognition · Computer Science 2021-04-02 Yang Fu , Sifei Liu , Umar Iqbal , Shalini De Mello , Humphrey Shi , Jan Kautz

DeVIS: Making Deformable Transformers Work for Video Instance Segmentation

Video Instance Segmentation (VIS) jointly tackles multi-object detection, tracking, and segmentation in video sequences. In the past, VIS methods mirrored the fragmentation of these subtasks in their architectural design, hence missing out…

Computer Vision and Pattern Recognition · Computer Science 2022-07-25 Adrià Caelles , Tim Meinhardt , Guillem Brasó , Laura Leal-Taixé

OpenVIS: Open-vocabulary Video Instance Segmentation

Open-vocabulary Video Instance Segmentation (OpenVIS) can simultaneously detect, segment, and track arbitrary object categories in a video, without being constrained to categories seen during training. In this work, we propose InstFormer, a…

Computer Vision and Pattern Recognition · Computer Science 2024-08-20 Pinxue Guo , Tony Huang , Peiyang He , Xuefeng Liu , Tianjun Xiao , Zhaoyu Chen , Wenqiang Zhang

UnOVOST: Unsupervised Offline Video Object Segmentation and Tracking

We address Unsupervised Video Object Segmentation (UVOS), the task of automatically generating accurate pixel masks for salient objects in a video sequence and of tracking these objects consistently through time, without any input about…

Computer Vision and Pattern Recognition · Computer Science 2020-01-16 Jonathon Luiten , Idil Esen Zulfikar , Bastian Leibe

A Generalized Framework for Video Instance Segmentation

The handling of long videos with complex and occluded sequences has recently emerged as a new challenge in the video instance segmentation (VIS) community. However, existing methods have limitations in addressing this challenge. We argue…

Computer Vision and Pattern Recognition · Computer Science 2023-03-27 Miran Heo , Sukjun Hwang , Jeongseok Hyun , Hanjung Kim , Seoung Wug Oh , Joon-Young Lee , Seon Joo Kim

Video Instance Segmentation in an Open-World

Existing video instance segmentation (VIS) approaches generally follow a closed-world assumption, where only seen category instances are identified and spatio-temporally segmented at inference. Open-world formulation relaxes the close-world…

Computer Vision and Pattern Recognition · Computer Science 2023-04-04 Omkar Thawakar , Sanath Narayan , Hisham Cholakkal , Rao Muhammad Anwer , Salman Khan , Jorma Laaksonen , Mubarak Shah , Fahad Shahbaz Khan

PM-VIS: High-Performance Box-Supervised Video Instance Segmentation

Labeling pixel-wise object masks in videos is a resource-intensive and laborious process. Box-supervised Video Instance Segmentation (VIS) methods have emerged as a viable solution to mitigate the labor-intensive annotation process. . In…

Computer Vision and Pattern Recognition · Computer Science 2024-04-23 Zhangjing Yang , Dun Liu , Wensheng Cheng , Jinqiao Wang , Yi Wu

U-Net Based Multi-instance Video Object Segmentation

Multi-instance video object segmentation is to segment specific instances throughout a video sequence in pixel level, given only an annotated first frame. In this paper, we implement an effective fully convolutional networks with U-Net…

Computer Vision and Pattern Recognition · Computer Science 2019-05-21 Heguang Liu , Jingle Jiang

UVOSAM: A Mask-free Paradigm for Unsupervised Video Object Segmentation via Segment Anything Model

The current state-of-the-art methods for unsupervised video object segmentation (UVOS) require extensive training on video datasets with mask annotations, limiting their effectiveness in handling challenging scenarios. However, the Segment…

Computer Vision and Pattern Recognition · Computer Science 2025-07-09 Zhenghao Zhang , Shengfan Zhang , Zhichao Wei , Zuozhuo Dai , Siyu Zhu

BoxVIS: Video Instance Segmentation with Box Annotations

It is expensive and labour-extensive to label the pixel-wise object masks in a video. As a result, the amount of pixel-wise annotations in existing video instance segmentation (VIS) datasets is small, limiting the generalization capability…

Computer Vision and Pattern Recognition · Computer Science 2023-07-13 Minghan Li , Lei Zhang

Weakly Supervised Instance Segmentation for Videos with Temporal Mask Consistency

Weakly supervised instance segmentation reduces the cost of annotations required to train models. However, existing approaches which rely only on image-level class labels predominantly suffer from errors due to (a) partial segmentation of…

Computer Vision and Pattern Recognition · Computer Science 2021-03-25 Qing Liu , Vignesh Ramanathan , Dhruv Mahajan , Alan Yuille , Zhenheng Yang

End-to-End Video Instance Segmentation with Transformers

Video instance segmentation (VIS) is the task that requires simultaneously classifying, segmenting and tracking object instances of interest in video. Recent methods typically develop sophisticated pipelines to tackle this task. Here, we…

Computer Vision and Pattern Recognition · Computer Science 2021-10-11 Yuqing Wang , Zhaoliang Xu , Xinlong Wang , Chunhua Shen , Baoshan Cheng , Hao Shen , Huaxia Xia

Tsanet: Temporal and Scale Alignment for Unsupervised Video Object Segmentation

Unsupervised Video Object Segmentation (UVOS) refers to the challenging task of segmenting the prominent object in videos without manual guidance. In recent works, two approaches for UVOS have been discussed that can be divided into:…

Computer Vision and Pattern Recognition · Computer Science 2024-02-22 Seunghoon Lee , Suhwan Cho , Dogyoon Lee , Minhyeok Lee , Sangyoun Lee

DVIS++: Improved Decoupled Framework for Universal Video Segmentation

We present the \textbf{D}ecoupled \textbf{VI}deo \textbf{S}egmentation (DVIS) framework, a novel approach for the challenging task of universal video segmentation, including video instance segmentation (VIS), video semantic segmentation…

Computer Vision and Pattern Recognition · Computer Science 2023-12-22 Tao Zhang , Xingye Tian , Yikang Zhou , Shunping Ji , Xuebo Wang , Xin Tao , Yuan Zhang , Pengfei Wan , Zhongyuan Wang , Yu Wu

Improving Unsupervised Video Object Segmentation with Motion-Appearance Synergy

We present IMAS, a method that segments the primary objects in videos without manual annotation in training or inference. Previous methods in unsupervised video object segmentation (UVOS) have demonstrated the effectiveness of motion as…

Computer Vision and Pattern Recognition · Computer Science 2022-12-20 Long Lian , Zhirong Wu , Stella X. Yu