English
Related papers

Related papers: Audio-Visual Instance Segmentation

200 papers

Recently, an audio-visual instance segmentation (AVIS) task has been introduced, aiming to identify, segment and track individual sounding instances in videos. However, prevailing methods primarily adopt the offline paradigm, that cannot…

Computer Vision and Pattern Recognition · Computer Science 2026-03-03 Yingjian Zhu , Ying Wang , Yuyang Hong , Ruohao Guo , Kun Ding , Xin Gu , Bin Fan , Shiming Xiang

The audio-visual segmentation (AVS) task aims to segment sounding objects from a given video. Existing works mainly focus on fusing audio and visual features of a given video to achieve sounding object masks. However, we observed that prior…

Sound · Computer Science 2023-08-02 Chen Liu , Peike Li , Xingqun Qi , Hu Zhang , Lincheng Li , Dadong Wang , Xin Yu

Visual objects often have acoustic signatures that are naturally synchronized with them in audio-bearing video recordings. For this project, we explore the multimodal feature aggregation for video instance segmentation task, in which we…

Computer Vision and Pattern Recognition · Computer Science 2023-01-26 Kaihui Zheng , Yuqing Ren , Zixin Shen , Tianxu Qin

Audiovisual instance segmentation (AVIS) requires accurately localizing and tracking sounding objects throughout video sequences. Existing methods suffer from visual bias stemming from two fundamental issues: uniform additive fusion…

Audio and Speech Processing · Electrical Eng. & Systems 2026-01-30 Jinbae Seo , Hyeongjun Kwon , Kwonyoung Kim , Jiyoung Lee , Kwanghoon Sohn

Audio-visual segmentation (AVS) is a challenging task that involves accurately segmenting sounding objects based on audio-visual cues. The effectiveness of audio-visual learning critically depends on achieving accurate cross-modal alignment…

Computer Vision and Pattern Recognition · Computer Science 2024-08-15 Yuanhong Chen , Yuyuan Liu , Hu Wang , Fengbei Liu , Chong Wang , Helen Frazer , Gustavo Carneiro

Video Instance Segmentation (VIS) aims at segmenting and categorizing objects in videos from a closed set of training categories, lacking the generalization ability to handle novel categories in real-world videos. To address this…

Computer Vision and Pattern Recognition · Computer Science 2023-08-08 Haochen Wang , Cilin Yan , Shuai Wang , Xiaolong Jiang , XU Tang , Yao Hu , Weidi Xie , Efstratios Gavves

Audio-Visual Segmentation (AVS) aims to identify and segment sound-producing objects in videos by leveraging both visual and audio modalities. It has emerged as a significant research area in multimodal perception, enabling fine-grained…

Computer Vision and Pattern Recognition · Computer Science 2025-08-07 Jia Li , Yapeng Tian

Recognizing the sounding objects in scenes is a longstanding objective in embodied AI, with diverse applications in robotics and AR/VR/MR. To that end, Audio-Visual Segmentation (AVS), taking as condition an audio signal to identify the…

Computer Vision and Pattern Recognition · Computer Science 2025-10-22 Artem Sokolov , Swapnil Bhosale , Xiatian Zhu

Traditional reference segmentation tasks have predominantly focused on silent visual scenes, neglecting the integral role of multimodal perception and interaction in human experiences. In this work, we introduce a novel task called…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Yaoting Wang , Peiwen Sun , Dongzhan Zhou , Guangyao Li , Honggang Zhang , Di Hu

Handling occlusion remains a significant challenge for video instance-level tasks like Multiple Object Tracking (MOT) and Video Instance Segmentation (VIS). In this paper, we propose a novel framework, Amodal-Aware Video Instance…

Computer Vision and Pattern Recognition · Computer Science 2025-04-11 Minh Tran , Thang Pham , Winston Bounsavy , Tri Nguyen , Ngan Le

We propose to explore a new problem called audio-visual segmentation (AVS), in which the goal is to output a pixel-level map of the object(s) that produce sound at the time of the image frame. To facilitate this research, we construct the…

Computer Vision and Pattern Recognition · Computer Science 2023-02-20 Jinxing Zhou , Jianyuan Wang , Jiayi Zhang , Weixuan Sun , Jing Zhang , Stan Birchfield , Dan Guo , Lingpeng Kong , Meng Wang , Yiran Zhong

Open-vocabulary Video Instance Segmentation (OpenVIS) can simultaneously detect, segment, and track arbitrary object categories in a video, without being constrained to categories seen during training. In this work, we propose InstFormer, a…

Computer Vision and Pattern Recognition · Computer Science 2024-08-20 Pinxue Guo , Tony Huang , Peiyang He , Xuefeng Liu , Tianjun Xiao , Zhaoyu Chen , Wenqiang Zhang

Can our video understanding systems perceive objects when a heavy occlusion exists in a scene? To answer this question, we collect a large-scale dataset called OVIS for occluded video instance segmentation, that is, to simultaneously…

Computer Vision and Pattern Recognition · Computer Science 2022-05-18 Jiyang Qi , Yan Gao , Yao Hu , Xinggang Wang , Xiaoyu Liu , Xiang Bai , Serge Belongie , Alan Yuille , Philip H. S. Torr , Song Bai

The aim of audio-visual segmentation (AVS) is to precisely differentiate audible objects within videos down to the pixel level. Traditional approaches often tackle this challenge by combining information from various modalities, where the…

Computer Vision and Pattern Recognition · Computer Science 2023-12-20 Dawei Hao , Yuxin Mao , Bowen He , Xiaodong Han , Yuchao Dai , Yiran Zhong

Video instance segmentation (VIS) aims at classifying, segmenting and tracking object instances in video sequences. Recent transformer-based neural networks have demonstrated their powerful capability of modeling spatio-temporal…

Computer Vision and Pattern Recognition · Computer Science 2022-07-13 Xiang Li , Jinglu Wang , Xiaohao Xu , Bhiksha Raj , Yan Lu

Video instance segmentation (VIS) is a challenging vision task that aims to detect, segment, and track objects in videos. Conventional VIS methods rely on densely-annotated object masks which are expensive. We reduce the human annotations…

Computer Vision and Pattern Recognition · Computer Science 2024-04-03 Shuaiyi Huang , De-An Huang , Zhiding Yu , Shiyi Lan , Subhashree Radhakrishnan , Jose M. Alvarez , Abhinav Shrivastava , Anima Anandkumar

Audio-Visual Segmentation (AVS) aims to produce pixel-level masks of sound producing objects in videos, by jointly learning from audio and visual signals. However, real-world environments are inherently dynamic, causing audio and visual…

Computer Vision and Pattern Recognition · Computer Science 2026-03-11 Siddeshwar Raghavan , Gautham Vinod , Bruce Coburn , Fengqing Zhu

The combination of audio and vision has long been a topic of interest in the multi-modal community. Recently, a new audio-visual segmentation (AVS) task has been introduced, aiming to locate and segment the sounding objects in a given…

Computer Vision and Pattern Recognition · Computer Science 2023-12-19 Shengyi Gao , Zhe Chen , Guo Chen , Wenhai Wang , Tong Lu

Unlike traditional visual segmentation, audio-visual segmentation (AVS) requires the model not only to identify and segment objects but also to determine whether they are sound sources. Recent AVS approaches, leveraging transformer…

Sound · Computer Science 2025-02-24 Jia Li , Wenjie Zhao , Ziru Huang , Yunhui Guo , Yapeng Tian

Given an audio-visual pair, audio-visual segmentation (AVS) aims to locate sounding sources by predicting pixel-wise maps. Previous methods assume that each sound component in an audio signal always has a visual counterpart in the image.…

Computer Vision and Pattern Recognition · Computer Science 2023-08-22 Chen Liu , Peike Li , Hu Zhang , Lincheng Li , Zi Huang , Dadong Wang , Xin Yu
‹ Prev 1 2 3 10 Next ›