Related papers: SeqFormer: Sequential Transformer for Video Instan…

InstanceFormer: An Online Video Instance Segmentation Framework

Recent transformer-based offline video instance segmentation (VIS) approaches achieve encouraging results and significantly outperform online approaches. However, their reliance on the whole video and the immense computational complexity…

Computer Vision and Pattern Recognition · Computer Science 2022-11-28 Rajat Koner , Tanveer Hannan , Suprosanna Shit , Sahand Sharifzadeh , Matthias Schubert , Thomas Seidl , Volker Tresp

End-to-End Video Instance Segmentation with Transformers

Video instance segmentation (VIS) is the task that requires simultaneously classifying, segmenting and tracking object instances of interest in video. Recent methods typically develop sophisticated pipelines to tackle this task. Here, we…

Computer Vision and Pattern Recognition · Computer Science 2021-10-11 Yuqing Wang , Zhaoliang Xu , Xinlong Wang , Chunhua Shen , Baoshan Cheng , Hao Shen , Huaxia Xia

Consistent Video Instance Segmentation with Inter-Frame Recurrent Attention

Video instance segmentation aims at predicting object segmentation masks for each frame, as well as associating the instances across multiple frames. Recent end-to-end video instance segmentation methods are capable of performing object…

Computer Vision and Pattern Recognition · Computer Science 2022-06-15 Quanzeng You , Jiang Wang , Peng Chu , Andre Abrantes , Zicheng Liu

Video Instance Segmentation via Multi-scale Spatio-temporal Split Attention Transformer

State-of-the-art transformer-based video instance segmentation (VIS) approaches typically utilize either single-scale spatio-temporal features or per-frame multi-scale features during the attention computations. We argue that such an…

Computer Vision and Pattern Recognition · Computer Science 2022-03-25 Omkar Thawakar , Sanath Narayan , Jiale Cao , Hisham Cholakkal , Rao Muhammad Anwer , Muhammad Haris Khan , Salman Khan , Michael Felsberg , Fahad Shahbaz Khan

Mask2Former for Video Instance Segmentation

We find Mask2Former also achieves state-of-the-art performance on video instance segmentation without modifying the architecture, the loss or even the training pipeline. In this report, we show universal image segmentation architectures…

Computer Vision and Pattern Recognition · Computer Science 2021-12-21 Bowen Cheng , Anwesa Choudhuri , Ishan Misra , Alexander Kirillov , Rohit Girdhar , Alexander G. Schwing

Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation

We introduce a method for simultaneously classifying, segmenting and tracking object instances in a video sequence. Our method, named MaskProp, adapts the popular Mask R-CNN to video by adding a mask propagation branch that propagates…

Computer Vision and Pattern Recognition · Computer Science 2021-07-13 Gedas Bertasius , Lorenzo Torresani

Superpoint Transformer for 3D Scene Instance Segmentation

Most existing methods realize 3D instance segmentation by extending those models used for 3D object detection or 3D semantic segmentation. However, these non-straightforward methods suffer from two drawbacks: 1) Imprecise bounding boxes or…

Computer Vision and Pattern Recognition · Computer Science 2022-11-30 Jiahao Sun , Chunmei Qing , Junpeng Tan , Xiangmin Xu

OpenVIS: Open-vocabulary Video Instance Segmentation

Open-vocabulary Video Instance Segmentation (OpenVIS) can simultaneously detect, segment, and track arbitrary object categories in a video, without being constrained to categories seen during training. In this work, we propose InstFormer, a…

Computer Vision and Pattern Recognition · Computer Science 2024-08-20 Pinxue Guo , Tony Huang , Peiyang He , Xuefeng Liu , Tianjun Xiao , Zhaoyu Chen , Wenqiang Zhang

Online Video Instance Segmentation via Robust Context Fusion

Video instance segmentation (VIS) aims at classifying, segmenting and tracking object instances in video sequences. Recent transformer-based neural networks have demonstrated their powerful capability of modeling spatio-temporal…

Computer Vision and Pattern Recognition · Computer Science 2022-07-13 Xiang Li , Jinglu Wang , Xiaohao Xu , Bhiksha Raj , Yan Lu

Language as Queries for Referring Video Object Segmentation

Referring video object segmentation (R-VOS) is an emerging cross-modal task that aims to segment the target object referred by a language expression in all video frames. In this work, we propose a simple and unified framework built upon…

Computer Vision and Pattern Recognition · Computer Science 2022-03-15 Jiannan Wu , Yi Jiang , Peize Sun , Zehuan Yuan , Ping Luo

InsPro: Propagating Instance Query and Proposal for Online Video Instance Segmentation

Video instance segmentation (VIS) aims at segmenting and tracking objects in videos. Prior methods typically generate frame-level or clip-level object instances first and then associate them by either additional tracking heads or complex…

Computer Vision and Pattern Recognition · Computer Science 2023-01-06 Fei He , Haoyang Zhang , Naiyu Gao , Jian Jia , Yanhu Shan , Xin Zhao , Kaiqi Huang

Temporally Efficient Vision Transformer for Video Instance Segmentation

Recently vision transformer has achieved tremendous success on image-level visual recognition tasks. To effectively and efficiently model the crucial temporal information within a video clip, we propose a Temporally Efficient Vision…

Computer Vision and Pattern Recognition · Computer Science 2022-04-19 Shusheng Yang , Xinggang Wang , Yu Li , Yuxin Fang , Jiemin Fang , Wenyu Liu , Xun Zhao , Ying Shan

Video Instance Segmentation using Inter-Frame Communication Transformers

We propose a novel end-to-end solution for video instance segmentation (VIS) based on transformers. Recently, the per-clip pipeline shows superior performance over per-frame methods leveraging richer information from multiple frames.…

Computer Vision and Pattern Recognition · Computer Science 2021-06-08 Sukjun Hwang , Miran Heo , Seoung Wug Oh , Seon Joo Kim

Hybrid Instance-aware Temporal Fusion for Online Video Instance Segmentation

Recently, transformer-based image segmentation methods have achieved notable success against previous solutions. While for video domains, how to effectively model temporal context with the attention of object instances across frames remains…

Computer Vision and Pattern Recognition · Computer Science 2022-06-08 Xiang Li , Jinglu Wang , Xiao Li , Yan Lu

Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation

Video instance segmentation aims to detect, segment, and track objects in a video. Current approaches extend image-level segmentation algorithms to the temporal domain. However, this results in temporally inconsistent masks. In this work,…

Computer Vision and Pattern Recognition · Computer Science 2021-12-15 Anirudh S Chakravarthy , Won-Dong Jang , Zudi Lin , Donglai Wei , Song Bai , Hanspeter Pfister

Efficient Video Segmentation Models with Per-frame Inference

Most existing real-time deep models trained with each frame independently may produce inconsistent results across the temporal axis when tested on a video sequence. A few methods take the correlations in the video sequence into…

Computer Vision and Pattern Recognition · Computer Science 2022-02-28 Yifan Liu , Chunhua Shen , Changqian Yu , Jingdong Wang

MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training

We propose MinVIS, a minimal video instance segmentation (VIS) framework that achieves state-of-the-art VIS performance with neither video-based architectures nor training procedures. By only training a query-based image instance…

Computer Vision and Pattern Recognition · Computer Science 2022-08-04 De-An Huang , Zhiding Yu , Anima Anandkumar

VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation

Existing approaches to unsupervised video instance segmentation typically rely on motion estimates and experience difficulties tracking small or divergent motions. We present VideoCutLER, a simple method for unsupervised multi-instance…

Computer Vision and Pattern Recognition · Computer Science 2023-08-29 Xudong Wang , Ishan Misra , Ziyun Zeng , Rohit Girdhar , Trevor Darrell

Weakly Supervised Instance Segmentation for Videos with Temporal Mask Consistency

Weakly supervised instance segmentation reduces the cost of annotations required to train models. However, existing approaches which rely only on image-level class labels predominantly suffer from errors due to (a) partial segmentation of…

Computer Vision and Pattern Recognition · Computer Science 2021-03-25 Qing Liu , Vignesh Ramanathan , Dhruv Mahajan , Alan Yuille , Zhenheng Yang

Video Instance Segmentation with a Propose-Reduce Paradigm

Video instance segmentation (VIS) aims to segment and associate all instances of predefined classes for each frame in videos. Prior methods usually obtain segmentation for a frame or clip first, and merge the incomplete results by tracking…

Computer Vision and Pattern Recognition · Computer Science 2021-10-01 Huaijia Lin , Ruizheng Wu , Shu Liu , Jiangbo Lu , Jiaya Jia