English
Related papers

Related papers: Unsupervised Audio-Visual Segmentation with Modali…

200 papers

Audio-Visual Segmentation (AVS) aims to precisely outline audible objects in a visual scene at the pixel level. Existing AVS methods require fine-grained annotations of audio-mask pairs in supervised learning fashion. This limits their…

Computer Vision and Pattern Recognition · Computer Science 2023-09-14 Swapnil Bhosale , Haosen Yang , Diptesh Kanojia , Xiatian Zhu

Audio-visual segmentation (AVS) is a challenging task that involves accurately segmenting sounding objects based on audio-visual cues. The effectiveness of audio-visual learning critically depends on achieving accurate cross-modal alignment…

Computer Vision and Pattern Recognition · Computer Science 2024-08-15 Yuanhong Chen , Yuyuan Liu , Hu Wang , Fengbei Liu , Chong Wang , Helen Frazer , Gustavo Carneiro

The objective of Audio-Visual Segmentation (AVS) is to localise the sounding objects within visual scenes by accurately predicting pixel-wise segmentation masks. To tackle the task, it involves a comprehensive consideration of both the data…

Computer Vision and Pattern Recognition · Computer Science 2023-10-10 Jinxiang Liu , Yu Wang , Chen Ju , Chaofan Ma , Ya Zhang , Weidi Xie

Audio-visual segmentation is a challenging task that aims to predict pixel-level masks for sound sources in a video. Previous work applied a comprehensive manually designed architecture with countless pixel-wise accurate masks as…

Computer Vision and Pattern Recognition · Computer Science 2023-11-28 Shentong Mo , Bhiksha Raj

Audio-Visual Semantic Segmentation (AVSS) aligns audio and video at the pixel level but requires costly per-frame annotations. We introduce Weakly Supervised Audio-Visual Semantic Segmentation (WSAVSS), which uses only video-level labels to…

Multimedia · Computer Science 2026-03-24 Chengzhi Li , Heyan Huang , Ping Jian , Yanghao Zhou

Audio-Visual Segmentation (AVS) aims to identify and segment sound-producing objects in videos by leveraging both visual and audio modalities. It has emerged as a significant research area in multimodal perception, enabling fine-grained…

Computer Vision and Pattern Recognition · Computer Science 2025-08-07 Jia Li , Yapeng Tian

The audio-visual segmentation (AVS) task aims to segment sounding objects from a given video. Existing works mainly focus on fusing audio and visual features of a given video to achieve sounding object masks. However, we observed that prior…

Sound · Computer Science 2023-08-02 Chen Liu , Peike Li , Xingqun Qi , Hu Zhang , Lincheng Li , Dadong Wang , Xin Yu

Audio-Visual Segmentation (AVS) aims to produce pixel-level masks of sound producing objects in videos, by jointly learning from audio and visual signals. However, real-world environments are inherently dynamic, causing audio and visual…

Computer Vision and Pattern Recognition · Computer Science 2026-03-11 Siddeshwar Raghavan , Gautham Vinod , Bruce Coburn , Fengqing Zhu

Audio-visual segmentation aims to separate sounding objects from videos by predicting pixel-level masks based on audio signals. Existing methods primarily concentrate on closed-set scenarios and direct audio-visual alignment and fusion,…

Machine Learning · Computer Science 2026-03-31 Shengkai Chen , Yifang Yin , Jinming Cao , Shili Xiang , Zhenguang Liu , Roger Zimmermann

How to effectively interact audio with vision has garnered considerable interest within the multi-modality research field. Recently, a novel audio-visual segmentation (AVS) task has been proposed, aiming to segment the sounding objects in…

Computer Vision and Pattern Recognition · Computer Science 2024-02-07 Tianxiang Chen , Zhentao Tan , Tao Gong , Qi Chu , Yue Wu , Bin Liu , Le Lu , Jieping Ye , Nenghai Yu

We propose to explore a new problem called audio-visual segmentation (AVS), in which the goal is to output a pixel-level map of the object(s) that produce sound at the time of the image frame. To facilitate this research, we construct the…

Computer Vision and Pattern Recognition · Computer Science 2023-02-20 Jinxing Zhou , Jianyuan Wang , Jiayi Zhang , Weixuan Sun , Jing Zhang , Stan Birchfield , Dan Guo , Lingpeng Kong , Meng Wang , Yiran Zhong

The primary aim of Audio-Visual Segmentation (AVS) is to precisely identify and locate auditory elements within visual scenes by accurately predicting segmentation masks at the pixel level. Achieving this involves comprehensively…

Computer Vision and Pattern Recognition · Computer Science 2024-07-08 Khanh-Binh Nguyen , Chae Jung Park

The aim of audio-visual segmentation (AVS) is to precisely differentiate audible objects within videos down to the pixel level. Traditional approaches often tackle this challenge by combining information from various modalities, where the…

Computer Vision and Pattern Recognition · Computer Science 2023-12-20 Dawei Hao , Yuxin Mao , Bowen He , Xiaodong Han , Yuchao Dai , Yiran Zhong

Audiovisual segmentation (AVS) aims to identify visual regions corresponding to sound sources, playing a vital role in video understanding, surveillance, and human-computer interaction. Traditional AVS methods depend on large-scale…

Computer Vision and Pattern Recognition · Computer Science 2025-06-10 Seung-jae Lee , Paul Hongsuck Seo

Audio-visual segmentation (AVS) aims to segment sound sources in the video sequence, requiring a pixel-level understanding of audio-visual correspondence. As the Segment Anything Model (SAM) has strongly impacted extensive fields of dense…

Computer Vision and Pattern Recognition · Computer Science 2024-06-11 Juhyeong Seon , Woobin Im , Sebin Lee , Jumin Lee , Sung-Eui Yoon

The goal of Audio-Visual Segmentation (AVS) is to localize and segment the sounding source objects from video frames. Research on AVS suffers from data scarcity due to the high cost of fine-grained manual annotations. Recent works attempt…

Computer Vision and Pattern Recognition · Computer Science 2025-05-30 Kyungbok Lee , You Zhang , Zhiyao Duan

We propose a new problem called audio-visual segmentation (AVS), in which the goal is to output a pixel-level map of the object(s) that produce sound at the time of the image frame. To facilitate this research, we construct the first…

Computer Vision and Pattern Recognition · Computer Science 2023-01-31 Jinxing Zhou , Xuyang Shen , Jianyuan Wang , Jiayi Zhang , Weixuan Sun , Jing Zhang , Stan Birchfield , Dan Guo , Lingpeng Kong , Meng Wang , Yiran Zhong

Unlike traditional visual segmentation, audio-visual segmentation (AVS) requires the model not only to identify and segment objects but also to determine whether they are sound sources. Recent AVS approaches, leveraging transformer…

Sound · Computer Science 2025-02-24 Jia Li , Wenjie Zhao , Ziru Huang , Yunhui Guo , Yapeng Tian

Weakly-supervised audio-visual video parsing (AVVP) seeks to detect audible, visible, and audio-visual events without temporal annotations. Previous work has emphasized refining global predictions through contrastive or collaborative…

Computer Vision and Pattern Recognition · Computer Science 2025-09-18 Yaru Chen , Ruohao Guo , Liting Gao , Yang Xiang , Qingyu Luo , Zhenbo Li , Wenwu Wang

Audio-Visual Segmentation (AVS) aims to extract the sounding object from a video frame, which is represented by a pixel-wise segmentation mask for application scenarios such as multi-modal video editing, augmented reality, and intelligent…

Image and Video Processing · Electrical Eng. & Systems 2024-12-25 Zhaofeng Shi , Qingbo Wu , Fanman Meng , Linfeng Xu , Hongliang Li
‹ Prev 1 2 3 10 Next ›