Related papers: Object Segmentation with Audio Context

Audio-Visual Instance Segmentation

In this paper, we propose a new multi-modal task, termed audio-visual instance segmentation (AVIS), which aims to simultaneously identify, segment and track individual sounding object instances in audible videos. To facilitate this…

Computer Vision and Pattern Recognition · Computer Science 2025-03-04 Ruohao Guo , Xianghua Ying , Yaru Chen , Dantong Niu , Guangyao Li , Liao Qu , Yanyu Qi , Jinxing Zhou , Bowei Xing , Wenzhen Yue , Ji Shi , Qixun Wang , Peiliang Zhang , Buwen Liang

Learning Video Object Segmentation from Static Images

Inspired by recent advances of deep learning in instance segmentation and object tracking, we introduce video object segmentation problem as a concept of guided instance segmentation. Our model proceeds on a per-frame basis, guided by the…

Computer Vision and Pattern Recognition · Computer Science 2019-02-05 Anna Khoreva , Federico Perazzi , Rodrigo Benenson , Bernt Schiele , Alexander Sorkine-Hornung

Audio-Visual Segmentation by Exploring Cross-Modal Mutual Semantics

The audio-visual segmentation (AVS) task aims to segment sounding objects from a given video. Existing works mainly focus on fusing audio and visual features of a given video to achieve sounding object masks. However, we observed that prior…

Sound · Computer Science 2023-08-02 Chen Liu , Peike Li , Xingqun Qi , Hu Zhang , Lincheng Li , Dadong Wang , Xin Yu

Deep Learning Techniques for Video Instance Segmentation: A Survey

Video instance segmentation, also known as multi-object tracking and segmentation, is an emerging computer vision research area introduced in 2019, aiming at detecting, segmenting, and tracking instances in videos simultaneously. By…

Computer Vision and Pattern Recognition · Computer Science 2023-10-20 Chenhao Xu , Chang-Tsun Li , Yongjian Hu , Chee Peng Lim , Douglas Creighton

Video Object of Interest Segmentation

In this work, we present a new computer vision task named video object of interest segmentation (VOIS). Given a video and a target image of interest, our objective is to simultaneously segment and track all objects in the video that are…

Computer Vision and Pattern Recognition · Computer Science 2022-12-07 Siyuan Zhou , Chunru Zhan , Biao Wang , Tiezheng Ge , Yuning Jiang , Li Niu

From Waveforms to Pixels: A Survey on Audio-Visual Segmentation

Audio-Visual Segmentation (AVS) aims to identify and segment sound-producing objects in videos by leveraging both visual and audio modalities. It has emerged as a significant research area in multimodal perception, enabling fine-grained…

Computer Vision and Pattern Recognition · Computer Science 2025-08-07 Jia Li , Yapeng Tian

Submodular video object proposal selection for semantic object segmentation

Learning a data-driven spatio-temporal semantic representation of the objects is the key to coherent and consistent labelling in video. This paper proposes to achieve semantic video object segmentation by learning a data-driven…

Computer Vision and Pattern Recognition · Computer Science 2024-07-09 Tinghuai Wang

Online Video Instance Segmentation via Robust Context Fusion

Video instance segmentation (VIS) aims at classifying, segmenting and tracking object instances in video sequences. Recent transformer-based neural networks have demonstrated their powerful capability of modeling spatio-temporal…

Computer Vision and Pattern Recognition · Computer Science 2022-07-13 Xiang Li , Jinglu Wang , Xiaohao Xu , Bhiksha Raj , Yan Lu

Identify, locate and separate: Audio-visual object extraction in large video collections using weak supervision

We tackle the problem of audiovisual scene analysis for weakly-labeled data. To this end, we build upon our previous audiovisual representation learning framework to perform object classification in noisy acoustic environments and integrate…

Computer Vision and Pattern Recognition · Computer Science 2018-11-12 Sanjeel Parekh , Alexey Ozerov , Slim Essid , Ngoc Duong , Patrick Pérez , Gaël Richard

Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation

Audio-visual segmentation (AVS) is a challenging task that involves accurately segmenting sounding objects based on audio-visual cues. The effectiveness of audio-visual learning critically depends on achieving accurate cross-modal alignment…

Computer Vision and Pattern Recognition · Computer Science 2024-08-15 Yuanhong Chen , Yuyuan Liu , Hu Wang , Fengbei Liu , Chong Wang , Helen Frazer , Gustavo Carneiro

Instance Segmentation with Cross-Modal Consistency

Segmenting object instances is a key task in machine perception, with safety-critical applications in robotics and autonomous driving. We introduce a novel approach to instance segmentation that jointly leverages measurements from multiple…

Computer Vision and Pattern Recognition · Computer Science 2022-10-18 Alex Zihao Zhu , Vincent Casser , Reza Mahjourian , Henrik Kretzschmar , Sören Pirk

CML-MOTS: Collaborative Multi-task Learning for Multi-Object Tracking and Segmentation

The advancement of computer vision has pushed visual analysis tasks from still images to the video domain. In recent years, video instance segmentation, which aims to track and segment multiple objects in video frames, has drawn much…

Computer Vision and Pattern Recognition · Computer Science 2023-11-03 Yiming Cui , Cheng Han , Dongfang Liu

STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos

Existing methods for instance segmentation in videos typically involve multi-stage pipelines that follow the tracking-by-detection paradigm and model a video clip as a sequence of images. Multiple networks are used to detect objects in…

Computer Vision and Pattern Recognition · Computer Science 2023-09-04 Ali Athar , Sabarinath Mahadevan , Aljoša Ošep , Laura Leal-Taixé , Bastian Leibe

Object Detection, Tracking, and Motion Segmentation for Object-level Video Segmentation

We present an approach for object segmentation in videos that combines frame-level object detection with concepts from object tracking and motion segmentation. The approach extracts temporally consistent object tubes based on an…

Computer Vision and Pattern Recognition · Computer Science 2016-08-11 Benjamin Drayer , Thomas Brox

CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation

Video instance segmentation is a complex task in which we need to detect, segment, and track each object for any given video. Previous approaches only utilize single-frame features for the detection, segmentation, and tracking of objects…

Computer Vision and Pattern Recognition · Computer Science 2020-12-08 Yang Fu , Linjie Yang , Ding Liu , Thomas S. Huang , Humphrey Shi

AV-SAM: Segment Anything Model Meets Audio-Visual Localization and Segmentation

Segment Anything Model (SAM) has recently shown its powerful effectiveness in visual segmentation tasks. However, there is less exploration concerning how SAM works on audio-visual tasks, such as visual sound localization and segmentation.…

Computer Vision and Pattern Recognition · Computer Science 2023-05-04 Shentong Mo , Yapeng Tian

Self-Supervised Audio-Visual Co-Segmentation

Segmenting objects in images and separating sound sources in audio are challenging tasks, in part because traditional approaches require large amounts of labeled data. In this paper we develop a neural network model for visual object…

Computer Vision and Pattern Recognition · Computer Science 2019-04-22 Andrew Rouditchenko , Hang Zhao , Chuang Gan , Josh McDermott , Antonio Torralba

Flow-free Video Object Segmentation

Segmenting foreground object from a video is a challenging task because of the large deformations of the objects, occlusions, and background clutter. In this paper, we propose a frame-by-frame but computationally efficient approach for…

Computer Vision and Pattern Recognition · Computer Science 2017-06-30 Aditya Vora , Shanmuganathan Raman

Automatic Video Object Segmentation via Motion-Appearance-Stream Fusion and Instance-aware Segmentation

This paper presents a method for automatic video object segmentation based on the fusion of motion stream, appearance stream, and instance-aware segmentation. The proposed scheme consists of a two-stream fusion network and an instance…

Computer Vision and Pattern Recognition · Computer Science 2019-12-04 Sungkwon Choo , Wonkyo Seo , Nam Ik Cho

A2VIS: Amodal-Aware Approach to Video Instance Segmentation

Handling occlusion remains a significant challenge for video instance-level tasks like Multiple Object Tracking (MOT) and Video Instance Segmentation (VIS). In this paper, we propose a novel framework, Amodal-Aware Video Instance…

Computer Vision and Pattern Recognition · Computer Science 2025-04-11 Minh Tran , Thang Pham , Winston Bounsavy , Tri Nguyen , Ngan Le