Related papers: MISS: Memory-efficient Instance Segmentation Frame…

Augment Before Copy-Paste: Data and Memory Efficiency-Oriented Instance Segmentation Framework for Sport-scenes

Instance segmentation is a fundamental task in computer vision with broad applications across various industries. In recent years, with the proliferation of deep learning and artificial intelligence applications, how to train effective…

Computer Vision and Pattern Recognition · Computer Science 2024-03-19 Chih-Chung Hsu , Chia-Ming Lee , Ming-Shyen Wu

Look Before You Match: Instance Understanding Matters in Video Object Segmentation

Exploring dense matching between the current frame and past frames for long-range context modeling, memory-based methods have demonstrated impressive results in video object segmentation (VOS) recently. Nevertheless, due to the lack of…

Computer Vision and Pattern Recognition · Computer Science 2022-12-14 Junke Wang , Dongdong Chen , Zuxuan Wu , Chong Luo , Chuanxin Tang , Xiyang Dai , Yucheng Zhao , Yujia Xie , Lu Yuan , Yu-Gang Jiang

Online Video Instance Segmentation via Robust Context Fusion

Video instance segmentation (VIS) aims at classifying, segmenting and tracking object instances in video sequences. Recent transformer-based neural networks have demonstrated their powerful capability of modeling spatio-temporal…

Computer Vision and Pattern Recognition · Computer Science 2022-07-13 Xiang Li , Jinglu Wang , Xiaohao Xu , Bhiksha Raj , Yan Lu

Efficient Video Instance Segmentation via Tracklet Query and Proposal

Video Instance Segmentation (VIS) aims to simultaneously classify, segment, and track multiple object instances in videos. Recent clip-level VIS takes a short video clip as input each time showing stronger performance than frame-level VIS…

Computer Vision and Pattern Recognition · Computer Science 2022-03-04 Jialian Wu , Sudhir Yarram , Hui Liang , Tian Lan , Junsong Yuan , Jayan Eledath , Gerard Medioni

MISS: Multiclass Interpretable Scoring Systems

In this work, we present a novel, machine-learning approach for constructing Multiclass Interpretable Scoring Systems (MISS) - a fully data-driven methodology for generating single, sparse, and user-friendly scoring systems for multiclass…

Machine Learning · Computer Science 2024-01-11 Michal K. Grzeszczyk , Tomasz Trzciński , Arkadiusz Sitek

1st Place Solution for YouTubeVOS Challenge 2021:Video Instance Segmentation

Video Instance Segmentation (VIS) is a multi-task problem performing detection, segmentation, and tracking simultaneously. Extended from image set applications, video data additionally induces the temporal information, which, if handled…

Computer Vision and Pattern Recognition · Computer Science 2021-07-12 Thuy C. Nguyen , Tuan N. Tang , Nam LH. Phan , Chuong H. Nguyen , Masayuki Yamazaki , Masao Yamanaka

Real-time Instance Segmentation of Surgical Instruments using Attention and Multi-scale Feature Fusion

Precise instrument segmentation aid surgeons to navigate the body more easily and increase patient safety. While accurate tracking of surgical instruments in real-time plays a crucial role in minimally invasive computer-assisted surgeries,…

Image and Video Processing · Electrical Eng. & Systems 2021-11-11 Juan Carlos Angeles-Ceron , Gilberto Ochoa-Ruiz , Leonardo Chang , Sharib Ali

Show and Segment: Universal Medical Image Segmentation via In-Context Learning

Medical image segmentation remains challenging due to the vast diversity of anatomical structures, imaging modalities, and segmentation tasks. While deep learning has made significant advances, current approaches struggle to generalize as…

Computer Vision and Pattern Recognition · Computer Science 2025-03-26 Yunhe Gao , Di Liu , Zhuowei Li , Yunsheng Li , Dongdong Chen , Mu Zhou , Dimitris N. Metaxas

A Temporal Modeling Framework for Video Pre-Training on Video Instance Segmentation

Contemporary Video Instance Segmentation (VIS) methods typically adhere to a pre-train then fine-tune regime, where a segmentation model trained on images is fine-tuned on videos. However, the lack of temporal knowledge in the pre-trained…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Qing Zhong , Peng-Tao Jiang , Wen Wang , Guodong Ding , Lin Wu , Kaiqi Huang

Crossover Learning for Fast Online Video Instance Segmentation

Modeling temporal visual context across frames is critical for video instance segmentation (VIS) and other video understanding tasks. In this paper, we propose a fast online VIS model named CrossVIS. For temporal information modeling in…

Computer Vision and Pattern Recognition · Computer Science 2021-04-14 Shusheng Yang , Yuxin Fang , Xinggang Wang , Yu Li , Chen Fang , Ying Shan , Bin Feng , Wenyu Liu

MAIS: Memory-Attention for Interactive Segmentation

Interactive medical segmentation reduces annotation effort by refining predictions through user feedback. Vision Transformer (ViT)-based models, such as the Segment Anything Model (SAM), achieve state-of-the-art performance using user…

Computer Vision and Pattern Recognition · Computer Science 2025-05-13 Mauricio Orbes-Arteaga , Oeslle Lucena , Sabastien Ourselin , M. Jorge Cardoso

Video Instance Segmentation via Multi-scale Spatio-temporal Split Attention Transformer

State-of-the-art transformer-based video instance segmentation (VIS) approaches typically utilize either single-scale spatio-temporal features or per-frame multi-scale features during the attention computations. We argue that such an…

Computer Vision and Pattern Recognition · Computer Science 2022-03-25 Omkar Thawakar , Sanath Narayan , Jiale Cao , Hisham Cholakkal , Rao Muhammad Anwer , Muhammad Haris Khan , Salman Khan , Michael Felsberg , Fahad Shahbaz Khan

MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training

We propose MinVIS, a minimal video instance segmentation (VIS) framework that achieves state-of-the-art VIS performance with neither video-based architectures nor training procedures. By only training a query-based image instance…

Computer Vision and Pattern Recognition · Computer Science 2022-08-04 De-An Huang , Zhiding Yu , Anima Anandkumar

Task-Specific Data Augmentation and Inference Processing for VIPriors Instance Segmentation Challenge

Instance segmentation is applied widely in image editing, image analysis and autonomous driving, etc. However, insufficient data is a common problem in practical applications. The Visual Inductive Priors(VIPriors) Instance Segmentation…

Computer Vision and Pattern Recognition · Computer Science 2022-11-22 Bo Yan , Xingran Zhao , Yadong Li , Hongbin Wang

SyncVIS: Synchronized Video Instance Segmentation

Recent DETR-based methods have advanced the development of Video Instance Segmentation (VIS) through transformers' efficiency and capability in modeling spatial and temporal information. Despite harvesting remarkable progress, existing…

Computer Vision and Pattern Recognition · Computer Science 2024-12-03 Rongkun Zheng , Lu Qi , Xi Chen , Yi Wang , Kun Wang , Yu Qiao , Hengshuang Zhao

MUSTAN: Multi-scale Temporal Context as Attention for Robust Video Foreground Segmentation

Video foreground segmentation (VFS) is an important computer vision task wherein one aims to segment the objects under motion from the background. Most of the current methods are image-based, i.e., rely only on spatial cues while ignoring…

Computer Vision and Pattern Recognition · Computer Science 2024-02-05 Praveen Kumar Pokala , Jaya Sai Kiran Patibandla , Naveen Kumar Pandey , Balakrishna Reddy Pailla

Channel Attention-Guided Cross-Modal Knowledge Distillation for Referring Image Segmentation

Referring image segmentation (RIS) requires accurate segmentation of target regions in images according to language descriptions, which is a cross-modal task integrating vision and language. Existing RIS methods typically employ large-scale…

Computer Vision and Pattern Recognition · Computer Science 2026-04-21 Chen Yang

PRIMED: Adaptive Modality Suppression for Referring Audio-Visual Segmentation via Biased Competition

Referring Audio-Visual Segmentation (Ref-AVS) seeks to localize and segment target objects in video frames based on visual, auditory, and textual referring cues. The task is challenging because the relevance of different modalities varies…

Computer Vision and Pattern Recognition · Computer Science 2026-05-11 Yuchen He , Jing Zhang

Insight Any Instance: Promptable Instance Segmentation for Remote Sensing Images

Instance segmentation of remote sensing images (RSIs) is an essential task for a wide range of applications such as land planning and intelligent transport. Instance segmentation of RSIs is constantly plagued by the unbalanced ratio of…

Computer Vision and Pattern Recognition · Computer Science 2024-09-12 Xuexue Li

DeVIS: Making Deformable Transformers Work for Video Instance Segmentation

Video Instance Segmentation (VIS) jointly tackles multi-object detection, tracking, and segmentation in video sequences. In the past, VIS methods mirrored the fragmentation of these subtasks in their architectural design, hence missing out…

Computer Vision and Pattern Recognition · Computer Science 2022-07-25 Adrià Caelles , Tim Meinhardt , Guillem Brasó , Laura Leal-Taixé