CAVIS: Context-Aware Video Instance Segmentation

Seunghun Lee; Jiwan Seo; Kiljoon Han; Minwoo Choi; Sunghoon Im

CAVIS: Context-Aware Video Instance Segmentation

Computer Vision and Pattern Recognition 2025-07-10 v2

Authors: Seunghun Lee , Jiwan Seo , Kiljoon Han , Minwoo Choi , Sunghoon Im

Abstract

In this paper, we introduce the Context-Aware Video Instance Segmentation (CAVIS), a novel framework designed to enhance instance association by integrating contextual information adjacent to each object. To efficiently extract and leverage this information, we propose the Context-Aware Instance Tracker (CAIT), which merges contextual data surrounding the instances with the core instance features to improve tracking accuracy. Additionally, we design the Prototypical Cross-frame Contrastive (PCC) loss, which ensures consistency in object-level features across frames, thereby significantly enhancing matching accuracy. CAVIS demonstrates superior performance over state-of-the-art methods on all benchmark datasets in video instance segmentation (VIS) and video panoptic segmentation (VPS). Notably, our method excels on the OVIS dataset, known for its particularly challenging videos. Project page: https://seung-hun-lee.github.io/projects/CAVIS/

Keywords

video segmentation video retrieval video understanding

Cite

@article{arxiv.2407.03010,
  title  = {CAVIS: Context-Aware Video Instance Segmentation},
  author = {Seunghun Lee and Jiwan Seo and Kiljoon Han and Minwoo Choi and Sunghoon Im},
  journal= {arXiv preprint arXiv:2407.03010},
  year   = {2025}
}

Comments

ICCV 2025. Code: https://github.com/Seung-Hun-Lee/CAVIS

CAVIS: Context-Aware Video Instance Segmentation

Abstract

Keywords

Cite

Comments

Related papers