Related papers: TIVE: A Toolbox for Identifying Video Instance Seg…

TIDE: A General Toolbox for Identifying Object Detection Errors

We introduce TIDE, a framework and associated toolbox for analyzing the sources of error in object detection and instance segmentation algorithms. Importantly, our framework is applicable across datasets and can be applied directly to…

Computer Vision and Pattern Recognition · Computer Science 2020-09-02 Daniel Bolya , Sean Foley , James Hays , Judy Hoffman

A Temporal Modeling Framework for Video Pre-Training on Video Instance Segmentation

Contemporary Video Instance Segmentation (VIS) methods typically adhere to a pre-train then fine-tune regime, where a segmentation model trained on images is fine-tuned on videos. However, the lack of temporal knowledge in the pre-trained…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Qing Zhong , Peng-Tao Jiang , Wen Wang , Guodong Ding , Lin Wu , Kaiqi Huang

An End-to-End Framework for Video Multi-Person Pose Estimation

Video-based human pose estimation models aim to address scenarios that cannot be effectively solved by static image models such as motion blur, out-of-focus and occlusion. Most existing approaches consist of two stages: detecting human…

Computer Vision and Pattern Recognition · Computer Science 2025-09-03 Zhihong Wei

Weakly Supervised Instance Segmentation for Videos with Temporal Mask Consistency

Weakly supervised instance segmentation reduces the cost of annotations required to train models. However, existing approaches which rely only on image-level class labels predominantly suffer from errors due to (a) partial segmentation of…

Computer Vision and Pattern Recognition · Computer Science 2021-03-25 Qing Liu , Vignesh Ramanathan , Dhruv Mahajan , Alan Yuille , Zhenheng Yang

1st Place Solution for YouTubeVOS Challenge 2021:Video Instance Segmentation

Video Instance Segmentation (VIS) is a multi-task problem performing detection, segmentation, and tracking simultaneously. Extended from image set applications, video data additionally induces the temporal information, which, if handled…

Computer Vision and Pattern Recognition · Computer Science 2021-07-12 Thuy C. Nguyen , Tuan N. Tang , Nam LH. Phan , Chuong H. Nguyen , Masayuki Yamazaki , Masao Yamanaka

DeVIS: Making Deformable Transformers Work for Video Instance Segmentation

Video Instance Segmentation (VIS) jointly tackles multi-object detection, tracking, and segmentation in video sequences. In the past, VIS methods mirrored the fragmentation of these subtasks in their architectural design, hence missing out…

Computer Vision and Pattern Recognition · Computer Science 2022-07-25 Adrià Caelles , Tim Meinhardt , Guillem Brasó , Laura Leal-Taixé

STC: Spatio-Temporal Contrastive Learning for Video Instance Segmentation

Video Instance Segmentation (VIS) is a task that simultaneously requires classification, segmentation, and instance association in a video. Recent VIS approaches rely on sophisticated pipelines to achieve this goal, including RoI-related…

Computer Vision and Pattern Recognition · Computer Science 2022-08-23 Zhengkai Jiang , Zhangxuan Gu , Jinlong Peng , Hang Zhou , Liang Liu , Yabiao Wang , Ying Tai , Chengjie Wang , Liqing Zhang

Online Video Instance Segmentation via Robust Context Fusion

Video instance segmentation (VIS) aims at classifying, segmenting and tracking object instances in video sequences. Recent transformer-based neural networks have demonstrated their powerful capability of modeling spatio-temporal…

Computer Vision and Pattern Recognition · Computer Science 2022-07-13 Xiang Li , Jinglu Wang , Xiaohao Xu , Bhiksha Raj , Yan Lu

In Defense of Online Models for Video Instance Segmentation

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance. However, online methods have their inherent…

Computer Vision and Pattern Recognition · Computer Science 2022-07-22 Junfeng Wu , Qihao Liu , Yi Jiang , Song Bai , Alan Yuille , Xiang Bai

Spatial-Temporal Multi-level Association for Video Object Segmentation

Existing semi-supervised video object segmentation methods either focus on temporal feature matching or spatial-temporal feature modeling. However, they do not address the issues of sufficient target interaction and efficient parallel…

Computer Vision and Pattern Recognition · Computer Science 2024-04-10 Deshui Miao , Xin Li , Zhenyu He , Huchuan Lu , Ming-Hsuan Yang

Temporally Efficient Vision Transformer for Video Instance Segmentation

Recently vision transformer has achieved tremendous success on image-level visual recognition tasks. To effectively and efficiently model the crucial temporal information within a video clip, we propose a Temporally Efficient Vision…

Computer Vision and Pattern Recognition · Computer Science 2022-04-19 Shusheng Yang , Xinggang Wang , Yu Li , Yuxin Fang , Jiemin Fang , Wenyu Liu , Xun Zhao , Ying Shan

RefineVIS: Video Instance Segmentation with Temporal Attention Refinement

We introduce a novel framework called RefineVIS for Video Instance Segmentation (VIS) that achieves good object association between frames and accurate segmentation masks by iteratively refining the representations using sequence context.…

Computer Vision and Pattern Recognition · Computer Science 2023-06-09 Andre Abrantes , Jiang Wang , Peng Chu , Quanzeng You , Zicheng Liu

Look Before You Match: Instance Understanding Matters in Video Object Segmentation

Exploring dense matching between the current frame and past frames for long-range context modeling, memory-based methods have demonstrated impressive results in video object segmentation (VOS) recently. Nevertheless, due to the lack of…

Computer Vision and Pattern Recognition · Computer Science 2022-12-14 Junke Wang , Dongdong Chen , Zuxuan Wu , Chong Luo , Chuanxin Tang , Xiyang Dai , Yucheng Zhao , Yujia Xie , Lu Yuan , Yu-Gang Jiang

Video Instance Segmentation via Multi-scale Spatio-temporal Split Attention Transformer

State-of-the-art transformer-based video instance segmentation (VIS) approaches typically utilize either single-scale spatio-temporal features or per-frame multi-scale features during the attention computations. We argue that such an…

Computer Vision and Pattern Recognition · Computer Science 2022-03-25 Omkar Thawakar , Sanath Narayan , Jiale Cao , Hisham Cholakkal , Rao Muhammad Anwer , Muhammad Haris Khan , Salman Khan , Michael Felsberg , Fahad Shahbaz Khan

Towards Real-Time Open-Vocabulary Video Instance Segmentation

In this paper, we address the challenge of performing open-vocabulary video instance segmentation (OV-VIS) in real-time. We analyze the computational bottlenecks of state-of-the-art foundation models that performs OV-VIS, and propose a new…

Computer Vision and Pattern Recognition · Computer Science 2024-12-06 Bin Yan , Martin Sundermeyer , David Joseph Tan , Huchuan Lu , Federico Tombari

Towards Open-Vocabulary Video Instance Segmentation

Video Instance Segmentation (VIS) aims at segmenting and categorizing objects in videos from a closed set of training categories, lacking the generalization ability to handle novel categories in real-world videos. To address this…

Computer Vision and Pattern Recognition · Computer Science 2023-08-08 Haochen Wang , Cilin Yan , Shuai Wang , Xiaolong Jiang , XU Tang , Yao Hu , Weidi Xie , Efstratios Gavves

Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation

Modern one-stage video instance segmentation networks suffer from two limitations. First, convolutional features are neither aligned with anchor boxes nor with ground-truth bounding boxes, reducing the mask sensitivity to spatial location.…

Computer Vision and Pattern Recognition · Computer Science 2021-04-13 Minghan Li , Shuai Li , Lida Li , Lei Zhang

Tag-Based Attention Guided Bottom-Up Approach for Video Instance Segmentation

Video Instance Segmentation is a fundamental computer vision task that deals with segmenting and tracking object instances across a video sequence. Most existing methods typically accomplish this task by employing a multi-stage top-down…

Computer Vision and Pattern Recognition · Computer Science 2022-04-25 Jyoti Kini , Mubarak Shah

DVIS: Decoupled Video Instance Segmentation Framework

Video instance segmentation (VIS) is a critical task with diverse applications, including autonomous driving and video editing. Existing methods often underperform on complex and long videos in real world, primarily due to two factors.…

Computer Vision and Pattern Recognition · Computer Science 2023-07-17 Tao Zhang , Xingye Tian , Yu Wu , Shunping Ji , Xuebo Wang , Yuan Zhang , Pengfei Wan

VITA: Video Instance Segmentation via Object Token Association

We introduce a novel paradigm for offline Video Instance Segmentation (VIS), based on the hypothesis that explicit object-oriented information can be a strong clue for understanding the context of the entire sequence. To this end, we…

Computer Vision and Pattern Recognition · Computer Science 2022-10-21 Miran Heo , Sukjun Hwang , Seoung Wug Oh , Joon-Young Lee , Seon Joo Kim