Related papers: A Generalized Framework for Video Instance Segment…

MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training

We propose MinVIS, a minimal video instance segmentation (VIS) framework that achieves state-of-the-art VIS performance with neither video-based architectures nor training procedures. By only training a query-based image instance…

Computer Vision and Pattern Recognition · Computer Science 2022-08-04 De-An Huang , Zhiding Yu , Anima Anandkumar

DVIS: Decoupled Video Instance Segmentation Framework

Video instance segmentation (VIS) is a critical task with diverse applications, including autonomous driving and video editing. Existing methods often underperform on complex and long videos in real world, primarily due to two factors.…

Computer Vision and Pattern Recognition · Computer Science 2023-07-17 Tao Zhang , Xingye Tian , Yu Wu , Shunping Ji , Xuebo Wang , Yuan Zhang , Pengfei Wan

UVIS: Unsupervised Video Instance Segmentation

Video instance segmentation requires classifying, segmenting, and tracking every object across video frames. Unlike existing approaches that rely on masks, boxes, or category labels, we propose UVIS, a novel Unsupervised Video Instance…

Computer Vision and Pattern Recognition · Computer Science 2024-06-12 Shuaiyi Huang , Saksham Suri , Kamal Gupta , Sai Saketh Rambhatla , Ser-nam Lim , Abhinav Shrivastava

Towards Open-Vocabulary Video Instance Segmentation

Video Instance Segmentation (VIS) aims at segmenting and categorizing objects in videos from a closed set of training categories, lacking the generalization ability to handle novel categories in real-world videos. To address this…

Computer Vision and Pattern Recognition · Computer Science 2023-08-08 Haochen Wang , Cilin Yan , Shuai Wang , Xiaolong Jiang , XU Tang , Yao Hu , Weidi Xie , Efstratios Gavves

In Defense of Online Models for Video Instance Segmentation

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance. However, online methods have their inherent…

Computer Vision and Pattern Recognition · Computer Science 2022-07-22 Junfeng Wu , Qihao Liu , Yi Jiang , Song Bai , Alan Yuille , Xiang Bai

Crossover Learning for Fast Online Video Instance Segmentation

Modeling temporal visual context across frames is critical for video instance segmentation (VIS) and other video understanding tasks. In this paper, we propose a fast online VIS model named CrossVIS. For temporal information modeling in…

Computer Vision and Pattern Recognition · Computer Science 2021-04-14 Shusheng Yang , Yuxin Fang , Xinggang Wang , Yu Li , Chen Fang , Ying Shan , Bin Feng , Wenyu Liu

Video Instance Segmentation with a Propose-Reduce Paradigm

Video instance segmentation (VIS) aims to segment and associate all instances of predefined classes for each frame in videos. Prior methods usually obtain segmentation for a frame or clip first, and merge the incomplete results by tracking…

Computer Vision and Pattern Recognition · Computer Science 2021-10-01 Huaijia Lin , Ruizheng Wu , Shu Liu , Jiangbo Lu , Jiaya Jia

Efficient Video Instance Segmentation via Tracklet Query and Proposal

Video Instance Segmentation (VIS) aims to simultaneously classify, segment, and track multiple object instances in videos. Recent clip-level VIS takes a short video clip as input each time showing stronger performance than frame-level VIS…

Computer Vision and Pattern Recognition · Computer Science 2022-03-04 Jialian Wu , Sudhir Yarram , Hui Liang , Tian Lan , Junsong Yuan , Jayan Eledath , Gerard Medioni

A Temporal Modeling Framework for Video Pre-Training on Video Instance Segmentation

Contemporary Video Instance Segmentation (VIS) methods typically adhere to a pre-train then fine-tune regime, where a segmentation model trained on images is fine-tuned on videos. However, the lack of temporal knowledge in the pre-trained…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Qing Zhong , Peng-Tao Jiang , Wen Wang , Guodong Ding , Lin Wu , Kaiqi Huang

SyncVIS: Synchronized Video Instance Segmentation

Recent DETR-based methods have advanced the development of Video Instance Segmentation (VIS) through transformers' efficiency and capability in modeling spatial and temporal information. Despite harvesting remarkable progress, existing…

Computer Vision and Pattern Recognition · Computer Science 2024-12-03 Rongkun Zheng , Lu Qi , Xi Chen , Yi Wang , Kun Wang , Yu Qiao , Hengshuang Zhao

NOVIS: A Case for End-to-End Near-Online Video Instance Segmentation

Until recently, the Video Instance Segmentation (VIS) community operated under the common belief that offline methods are generally superior to a frame by frame online processing. However, the recent success of online methods questions this…

Computer Vision and Pattern Recognition · Computer Science 2023-09-19 Tim Meinhardt , Matt Feiszli , Yuchen Fan , Laura Leal-Taixe , Rakesh Ranjan

TCOVIS: Temporally Consistent Online Video Instance Segmentation

In recent years, significant progress has been made in video instance segmentation (VIS), with many offline and online methods achieving state-of-the-art performance. While offline methods have the advantage of producing temporally…

Computer Vision and Pattern Recognition · Computer Science 2023-09-22 Junlong Li , Bingyao Yu , Yongming Rao , Jie Zhou , Jiwen Lu

End-to-End Video Instance Segmentation with Transformers

Video instance segmentation (VIS) is the task that requires simultaneously classifying, segmenting and tracking object instances of interest in video. Recent methods typically develop sophisticated pipelines to tackle this task. Here, we…

Computer Vision and Pattern Recognition · Computer Science 2021-10-11 Yuqing Wang , Zhaoliang Xu , Xinlong Wang , Chunhua Shen , Baoshan Cheng , Hao Shen , Huaxia Xia

OVSNet : Towards One-Pass Real-Time Video Object Segmentation

Video object segmentation aims at accurately segmenting the target object regions across consecutive frames. It is technically challenging for coping with complicated factors (e.g., shape deformations, occlusion and out of the lens). Recent…

Computer Vision and Pattern Recognition · Computer Science 2019-07-03 Peng Sun , Peiwen Lin , Guangliang Cheng , Jianping Shi , Jiawan Zhang , Xi Li

Online Video Instance Segmentation via Robust Context Fusion

Video instance segmentation (VIS) aims at classifying, segmenting and tracking object instances in video sequences. Recent transformer-based neural networks have demonstrated their powerful capability of modeling spatio-temporal…

Computer Vision and Pattern Recognition · Computer Science 2022-07-13 Xiang Li , Jinglu Wang , Xiaohao Xu , Bhiksha Raj , Yan Lu

Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge

Although deep learning methods have achieved advanced video object recognition performance in recent years, perceiving heavily occluded objects in a video is still a very challenging task. To promote the development of occlusion…

Computer Vision and Pattern Recognition · Computer Science 2021-11-16 Jiyang Qi , Yan Gao , Yao Hu , Xinggang Wang , Xiaoyu Liu , Xiang Bai , Serge Belongie , Alan Yuille , Philip H. S. Torr , Song Bai

OpenVIS: Open-vocabulary Video Instance Segmentation

Open-vocabulary Video Instance Segmentation (OpenVIS) can simultaneously detect, segment, and track arbitrary object categories in a video, without being constrained to categories seen during training. In this work, we propose InstFormer, a…

Computer Vision and Pattern Recognition · Computer Science 2024-08-20 Pinxue Guo , Tony Huang , Peiyang He , Xuefeng Liu , Tianjun Xiao , Zhaoyu Chen , Wenqiang Zhang

DeVIS: Making Deformable Transformers Work for Video Instance Segmentation

Video Instance Segmentation (VIS) jointly tackles multi-object detection, tracking, and segmentation in video sequences. In the past, VIS methods mirrored the fragmentation of these subtasks in their architectural design, hence missing out…

Computer Vision and Pattern Recognition · Computer Science 2022-07-25 Adrià Caelles , Tim Meinhardt , Guillem Brasó , Laura Leal-Taixé

1st Place Solution for YouTubeVOS Challenge 2021:Video Instance Segmentation

Video Instance Segmentation (VIS) is a multi-task problem performing detection, segmentation, and tracking simultaneously. Extended from image set applications, video data additionally induces the temporal information, which, if handled…

Computer Vision and Pattern Recognition · Computer Science 2021-07-12 Thuy C. Nguyen , Tuan N. Tang , Nam LH. Phan , Chuong H. Nguyen , Masayuki Yamazaki , Masao Yamanaka

DVIS++: Improved Decoupled Framework for Universal Video Segmentation

We present the \textbf{D}ecoupled \textbf{VI}deo \textbf{S}egmentation (DVIS) framework, a novel approach for the challenging task of universal video segmentation, including video instance segmentation (VIS), video semantic segmentation…

Computer Vision and Pattern Recognition · Computer Science 2023-12-22 Tao Zhang , Xingye Tian , Yikang Zhou , Shunping Ji , Xuebo Wang , Xin Tao , Yuan Zhang , Pengfei Wan , Zhongyuan Wang , Yu Wu