Related papers: SyncVIS: Synchronized Video Instance Segmentation

DeVIS: Making Deformable Transformers Work for Video Instance Segmentation

Video Instance Segmentation (VIS) jointly tackles multi-object detection, tracking, and segmentation in video sequences. In the past, VIS methods mirrored the fragmentation of these subtasks in their architectural design, hence missing out…

Computer Vision and Pattern Recognition · Computer Science 2022-07-25 Adrià Caelles , Tim Meinhardt , Guillem Brasó , Laura Leal-Taixé

1st Place Solution for YouTubeVOS Challenge 2021:Video Instance Segmentation

Video Instance Segmentation (VIS) is a multi-task problem performing detection, segmentation, and tracking simultaneously. Extended from image set applications, video data additionally induces the temporal information, which, if handled…

Computer Vision and Pattern Recognition · Computer Science 2021-07-12 Thuy C. Nguyen , Tuan N. Tang , Nam LH. Phan , Chuong H. Nguyen , Masayuki Yamazaki , Masao Yamanaka

A Generalized Framework for Video Instance Segmentation

The handling of long videos with complex and occluded sequences has recently emerged as a new challenge in the video instance segmentation (VIS) community. However, existing methods have limitations in addressing this challenge. We argue…

Computer Vision and Pattern Recognition · Computer Science 2023-03-27 Miran Heo , Sukjun Hwang , Jeongseok Hyun , Hanjung Kim , Seoung Wug Oh , Joon-Young Lee , Seon Joo Kim

Two-Level Temporal Relation Model for Online Video Instance Segmentation

In Video Instance Segmentation (VIS), current approaches either focus on the quality of the results, by taking the whole video as input and processing it offline; or on speed, by handling it frame by frame at the cost of competitive…

Computer Vision and Pattern Recognition · Computer Science 2022-11-01 Çağan Selim Çoban , Oğuzhan Keskin , Jordi Pont-Tuset , Fatma Güney

Efficient Video Instance Segmentation via Tracklet Query and Proposal

Video Instance Segmentation (VIS) aims to simultaneously classify, segment, and track multiple object instances in videos. Recent clip-level VIS takes a short video clip as input each time showing stronger performance than frame-level VIS…

Computer Vision and Pattern Recognition · Computer Science 2022-03-04 Jialian Wu , Sudhir Yarram , Hui Liang , Tian Lan , Junsong Yuan , Jayan Eledath , Gerard Medioni

DVIS: Decoupled Video Instance Segmentation Framework

Video instance segmentation (VIS) is a critical task with diverse applications, including autonomous driving and video editing. Existing methods often underperform on complex and long videos in real world, primarily due to two factors.…

Computer Vision and Pattern Recognition · Computer Science 2023-07-17 Tao Zhang , Xingye Tian , Yu Wu , Shunping Ji , Xuebo Wang , Yuan Zhang , Pengfei Wan

MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training

We propose MinVIS, a minimal video instance segmentation (VIS) framework that achieves state-of-the-art VIS performance with neither video-based architectures nor training procedures. By only training a query-based image instance…

Computer Vision and Pattern Recognition · Computer Science 2022-08-04 De-An Huang , Zhiding Yu , Anima Anandkumar

Online Video Instance Segmentation via Robust Context Fusion

Video instance segmentation (VIS) aims at classifying, segmenting and tracking object instances in video sequences. Recent transformer-based neural networks have demonstrated their powerful capability of modeling spatio-temporal…

Computer Vision and Pattern Recognition · Computer Science 2022-07-13 Xiang Li , Jinglu Wang , Xiaohao Xu , Bhiksha Raj , Yan Lu

STC: Spatio-Temporal Contrastive Learning for Video Instance Segmentation

Video Instance Segmentation (VIS) is a task that simultaneously requires classification, segmentation, and instance association in a video. Recent VIS approaches rely on sophisticated pipelines to achieve this goal, including RoI-related…

Computer Vision and Pattern Recognition · Computer Science 2022-08-23 Zhengkai Jiang , Zhangxuan Gu , Jinlong Peng , Hang Zhou , Liang Liu , Yabiao Wang , Ying Tai , Chengjie Wang , Liqing Zhang

STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos

Existing methods for instance segmentation in videos typically involve multi-stage pipelines that follow the tracking-by-detection paradigm and model a video clip as a sequence of images. Multiple networks are used to detect objects in…

Computer Vision and Pattern Recognition · Computer Science 2023-09-04 Ali Athar , Sabarinath Mahadevan , Aljoša Ošep , Laura Leal-Taixé , Bastian Leibe

Solve the Puzzle of Instance Segmentation in Videos: A Weakly Supervised Framework with Spatio-Temporal Collaboration

Instance segmentation in videos, which aims to segment and track multiple objects in video frames, has garnered a flurry of research attention in recent years. In this paper, we present a novel weakly supervised framework with…

Computer Vision and Pattern Recognition · Computer Science 2022-12-16 Liqi Yan , Qifan Wang , Siqi Ma , Jingang Wang , Changbin Yu

CTVIS: Consistent Training for Online Video Instance Segmentation

The discrimination of instance embeddings plays a vital role in associating instances across time for online video instance segmentation (VIS). Instance embedding learning is directly supervised by the contrastive loss computed upon the…

Computer Vision and Pattern Recognition · Computer Science 2023-07-25 Kaining Ying , Qing Zhong , Weian Mao , Zhenhua Wang , Hao Chen , Lin Yuanbo Wu , Yifan Liu , Chengxiang Fan , Yunzhi Zhuge , Chunhua Shen

DVIS++: Improved Decoupled Framework for Universal Video Segmentation

We present the \textbf{D}ecoupled \textbf{VI}deo \textbf{S}egmentation (DVIS) framework, a novel approach for the challenging task of universal video segmentation, including video instance segmentation (VIS), video semantic segmentation…

Computer Vision and Pattern Recognition · Computer Science 2023-12-22 Tao Zhang , Xingye Tian , Yikang Zhou , Shunping Ji , Xuebo Wang , Xin Tao , Yuan Zhang , Pengfei Wan , Zhongyuan Wang , Yu Wu

A Temporal Modeling Framework for Video Pre-Training on Video Instance Segmentation

Contemporary Video Instance Segmentation (VIS) methods typically adhere to a pre-train then fine-tune regime, where a segmentation model trained on images is fine-tuned on videos. However, the lack of temporal knowledge in the pre-trained…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Qing Zhong , Peng-Tao Jiang , Wen Wang , Guodong Ding , Lin Wu , Kaiqi Huang

Video Instance Segmentation with a Propose-Reduce Paradigm

Video instance segmentation (VIS) aims to segment and associate all instances of predefined classes for each frame in videos. Prior methods usually obtain segmentation for a frame or clip first, and merge the incomplete results by tracking…

Computer Vision and Pattern Recognition · Computer Science 2021-10-01 Huaijia Lin , Ruizheng Wu , Shu Liu , Jiangbo Lu , Jiaya Jia

Crossover Learning for Fast Online Video Instance Segmentation

Modeling temporal visual context across frames is critical for video instance segmentation (VIS) and other video understanding tasks. In this paper, we propose a fast online VIS model named CrossVIS. For temporal information modeling in…

Computer Vision and Pattern Recognition · Computer Science 2021-04-14 Shusheng Yang , Yuxin Fang , Xinggang Wang , Yu Li , Chen Fang , Ying Shan , Bin Feng , Wenyu Liu

CAVIS: Context-Aware Video Instance Segmentation

In this paper, we introduce the Context-Aware Video Instance Segmentation (CAVIS), a novel framework designed to enhance instance association by integrating contextual information adjacent to each object. To efficiently extract and leverage…

Computer Vision and Pattern Recognition · Computer Science 2025-07-10 Seunghun Lee , Jiwan Seo , Kiljoon Han , Minwoo Choi , Sunghoon Im

End-to-End Video Instance Segmentation with Transformers

Video instance segmentation (VIS) is the task that requires simultaneously classifying, segmenting and tracking object instances of interest in video. Recent methods typically develop sophisticated pipelines to tackle this task. Here, we…

Computer Vision and Pattern Recognition · Computer Science 2021-10-11 Yuqing Wang , Zhaoliang Xu , Xinlong Wang , Chunhua Shen , Baoshan Cheng , Hao Shen , Huaxia Xia

In Defense of Online Models for Video Instance Segmentation

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance. However, online methods have their inherent…

Computer Vision and Pattern Recognition · Computer Science 2022-07-22 Junfeng Wu , Qihao Liu , Yi Jiang , Song Bai , Alan Yuille , Xiang Bai

1st Place Solution for the 5th LSVOS Challenge: Video Instance Segmentation

Video instance segmentation is a challenging task that serves as the cornerstone of numerous downstream applications, including video editing and autonomous driving. In this report, we present further improvements to the SOTA VIS method,…

Computer Vision and Pattern Recognition · Computer Science 2023-08-29 Tao Zhang , Xingye Tian , Yikang Zhou , Yu Wu , Shunping Ji , Cilin Yan , Xuebo Wang , Xin Tao , Yuan Zhang , Pengfei Wan