Related papers: General and Task-Oriented Video Segmentation

FusionSeg: Learning to combine motion and appearance for fully automatic segmention of generic objects in videos

We propose an end-to-end learning framework for segmenting generic objects in videos. Our method learns to combine appearance and motion information to produce pixel level segmentation masks for all prominent objects in videos. We formulate…

Computer Vision and Pattern Recognition · Computer Science 2017-04-13 Suyog Dutt Jain , Bo Xiong , Kristen Grauman

TarViS: A Unified Approach for Target-based Video Segmentation

The general domain of video segmentation is currently fragmented into different tasks spanning multiple benchmarks. Despite rapid progress in the state-of-the-art, current methods are overwhelmingly task-specific and cannot conceptually…

Computer Vision and Pattern Recognition · Computer Science 2023-05-11 Ali Athar , Alexander Hermans , Jonathon Luiten , Deva Ramanan , Bastian Leibe

FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation

Recently, open-vocabulary learning has emerged to accomplish segmentation for arbitrary categories of text-based descriptions, which popularizes the segmentation system to more general-purpose application scenarios. However, existing…

Computer Vision and Pattern Recognition · Computer Science 2023-03-31 Jie Qin , Jie Wu , Pengxiang Yan , Ming Li , Ren Yuxi , Xuefeng Xiao , Yitong Wang , Rui Wang , Shilei Wen , Xin Pan , Xingang Wang

OMG-Seg: Is One Model Good Enough For All Segmentation?

In this work, we address various segmentation tasks, each traditionally tackled by distinct or partially unified models. We propose OMG-Seg, One Model that is Good enough to efficiently and effectively handle all the segmentation tasks,…

Computer Vision and Pattern Recognition · Computer Science 2024-10-02 Xiangtai Li , Haobo Yuan , Wei Li , Henghui Ding , Size Wu , Wenwei Zhang , Yining Li , Kai Chen , Chen Change Loy

A Class-wise Non-salient Region Generalized Framework for Video Semantic Segmentation

Video semantic segmentation (VSS) is beneficial for dealing with dynamic scenes due to the continuous property of the real-world environment. On the one hand, some methods alleviate the predicted inconsistent problem between continuous…

Computer Vision and Pattern Recognition · Computer Science 2023-01-02 Yuhang Zhang , Shishun Tian , Muxin Liao , Zhengyu Zhang , Wenbin Zou , Chen Xu

A Survey on Deep Learning Technique for Video Segmentation

Video segmentation -- partitioning video frames into multiple segments or objects -- plays a critical role in a broad range of practical applications, from enhancing visual effects in movie, to understanding scenes in autonomous driving, to…

Computer Vision and Pattern Recognition · Computer Science 2022-11-30 Tianfei Zhou , Fatih Porikli , David Crandall , Luc Van Gool , Wenguan Wang

InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models

Boosted by Multi-modal Large Language Models (MLLMs), text-guided universal segmentation models for the image and video domains have made rapid progress recently. However, these methods are often developed separately for specific domains,…

Computer Vision and Pattern Recognition · Computer Science 2024-12-19 Cong Wei , Yujie Zhong , Haoxian Tan , Yingsen Zeng , Yong Liu , Zheng Zhao , Yujiu Yang

RankSeg: Adaptive Pixel Classification with Image Category Ranking for Segmentation

The segmentation task has traditionally been formulated as a complete-label pixel classification task to predict a class for each pixel from a fixed number of predefined semantic categories shared by all images or videos. Yet, following…

Computer Vision and Pattern Recognition · Computer Science 2022-07-21 Haodi He , Yuhui Yuan , Xiangyu Yue , Han Hu

A Unified Framework for 3D Scene Understanding

We propose UniSeg3D, a unified 3D scene understanding framework that achieves panoptic, semantic, instance, interactive, referring, and open-vocabulary segmentation tasks within a single model. Most previous 3D segmentation approaches are…

Computer Vision and Pattern Recognition · Computer Science 2024-11-28 Wei Xu , Chunsheng Shi , Sifan Tu , Xin Zhou , Dingkang Liang , Xiang Bai

HyperSeg: Towards Universal Visual Segmentation with Large Language Model

This paper aims to address universal segmentation for image and video perception with the strong reasoning ability empowered by Visual Large Language Models (VLLMs). Despite significant progress in current unified segmentation methods,…

Computer Vision and Pattern Recognition · Computer Science 2024-12-03 Cong Wei , Yujie Zhong , Haoxian Tan , Yong Liu , Zheng Zhao , Jie Hu , Yujiu Yang

RobotSeg: A Model and Dataset for Segmenting Robots in Image and Video

Accurate robot segmentation is a fundamental capability for robotic perception. It enables precise visual servoing for VLA systems, scalable robot-centric data augmentation, accurate real-to-sim transfer, and reliable safety monitoring in…

Computer Vision and Pattern Recognition · Computer Science 2026-03-31 Haiyang Mei , Qiming Huang , Hai Ci , Mike Zheng Shou

UniGeoSeg: Towards Unified Open-World Segmentation for Geospatial Scenes

Instruction-driven segmentation in remote sensing generates masks from guidance, offering great potential for accessible and generalizable applications. However, existing methods suffer from fragmented task formulations and limited…

Computer Vision and Pattern Recognition · Computer Science 2026-05-12 Shuo Ni , Di Wang , He Chen , Haonan Guo , Ning Zhang , Jing Zhang

GraphSeg: Segmented 3D Representations via Graph Edge Addition and Contraction

Robots operating in unstructured environments often require accurate and consistent object-level representations. This typically requires segmenting individual objects from the robot's surroundings. While recent large models such as Segment…

Robotics · Computer Science 2025-04-07 Haozhan Tang , Tianyi Zhang , Oliver Kroemer , Matthew Johnson-Roberson , Weiming Zhi

AVSegFormer: Audio-Visual Segmentation with Transformer

The combination of audio and vision has long been a topic of interest in the multi-modal community. Recently, a new audio-visual segmentation (AVS) task has been introduced, aiming to locate and segment the sounding objects in a given…

Computer Vision and Pattern Recognition · Computer Science 2023-12-19 Shengyi Gao , Zhe Chen , Guo Chen , Wenhai Wang , Tong Lu

VideoSeg-R1:Reasoning Video Object Segmentation via Reinforcement Learning

Traditional video reasoning segmentation methods rely on supervised fine-tuning, which limits generalization to out-of-distribution scenarios and lacks explicit reasoning. To address this, we propose \textbf{VideoSeg-R1}, the first…

Computer Vision and Pattern Recognition · Computer Science 2025-11-21 Zishan Xu , Yifu Guo , Yuquan Lu , Fengyu Yang , Junxin Li

Multimodal Referring Segmentation: A Survey

Multimodal referring segmentation aims to segment target objects in visual scenes, such as images, videos, and 3D scenes, based on referring expressions in text or audio format. This task plays a crucial role in practical applications…

Computer Vision and Pattern Recognition · Computer Science 2025-08-06 Henghui Ding , Song Tang , Shuting He , Chang Liu , Zuxuan Wu , Yu-Gang Jiang

Towards Open-Vocabulary Video Instance Segmentation

Video Instance Segmentation (VIS) aims at segmenting and categorizing objects in videos from a closed set of training categories, lacking the generalization ability to handle novel categories in real-world videos. To address this…

Computer Vision and Pattern Recognition · Computer Science 2023-08-08 Haochen Wang , Cilin Yan , Shuai Wang , Xiaolong Jiang , XU Tang , Yao Hu , Weidi Xie , Efstratios Gavves

A Generalized Framework for Video Instance Segmentation

The handling of long videos with complex and occluded sequences has recently emerged as a new challenge in the video instance segmentation (VIS) community. However, existing methods have limitations in addressing this challenge. We argue…

Computer Vision and Pattern Recognition · Computer Science 2023-03-27 Miran Heo , Sukjun Hwang , Jeongseok Hyun , Hanjung Kim , Seoung Wug Oh , Joon-Young Lee , Seon Joo Kim

SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians

3D Gaussian Splatting has recently gained traction for its efficient training and real-time rendering. While its vanilla representation is mainly designed for view synthesis, recent works extended it to scene understanding with language…

Computer Vision and Pattern Recognition · Computer Science 2026-01-21 Siyun Liang , Sen Wang , Kunyi Li , Michael Niemeyer , Stefano Gasperini , Hendrik P. A. Lensch , Nassir Navab , Federico Tombari

Universal Segmentation at Arbitrary Granularity with Language Instruction

This paper aims to achieve universal segmentation of arbitrary semantic level. Despite significant progress in recent years, specialist segmentation approaches are limited to specific tasks and data distribution. Retraining a new model for…

Computer Vision and Pattern Recognition · Computer Science 2024-11-27 Yong Liu , Cairong Zhang , Yitong Wang , Jiahao Wang , Yujiu Yang , Yansong Tang