English
Related papers

Related papers: Multi-scale Multi-instance Visual Sound Localizati…

200 papers

Unsupervised audio-visual source localization aims at localizing visible sound sources in a video without relying on ground-truth localization for training. Previous works often seek high audio-visual similarities for likely positive…

Computer Vision and Pattern Recognition · Computer Science 2022-03-30 Shentong Mo , Pedro Morgado

Audio-Visual Source Localization (AVSL) aims to localize the source of sound within a video. In this paper, we identify a significant issue in existing benchmarks: the sounding objects are often easily recognized based solely on visual…

Multimedia · Computer Science 2024-09-12 Liangyu Chen , Zihao Yue , Boshen Xu , Qin Jin

How to visually localize multiple sound sources in unconstrained videos is a formidable problem, especially when lack of the pairwise sound-object annotations. To solve this problem, we develop a two-stage audiovisual learning framework…

Computer Vision and Pattern Recognition · Computer Science 2020-07-15 Rui Qian , Di Hu , Heinrich Dinkel , Mengyue Wu , Ning Xu , Weiyao Lin

Localizing visual sounds consists on locating the position of objects that emit sound within an image. It is a growing research area with potential applications in monitoring natural and urban environments, such as wildlife migration and…

Sound · Computer Science 2022-04-12 Ho-Hsiang Wu , Magdalena Fuentes , Prem Seetharaman , Juan Pablo Bello

Audio-visual sound source localization task aims to spatially localize sound-making objects within visual scenes by integrating visual and audio cues. However, existing methods struggle with accurately localizing sound-making objects in…

Computer Vision and Pattern Recognition · Computer Science 2025-06-25 Sung Jin Um , Dongjin Kim , Sangmin Lee , Jung Uk Kim

Visual sound source localization poses a significant challenge in identifying the semantic region of each sounding source within a video. Existing self-supervised and weakly supervised source localization methods struggle to accurately…

Computer Vision and Pattern Recognition · Computer Science 2024-07-09 Tanvir Mahmud , Yapeng Tian , Diana Marculescu

Sound source localization is a typical and challenging task that predicts the location of sound sources in a video. Previous single-source methods mainly used the audio-visual association as clues to localize sounding objects in each image.…

Computer Vision and Pattern Recognition · Computer Science 2023-03-31 Shentong Mo , Yapeng Tian

Audiovisual scenes are pervasive in our daily life. It is commonplace for humans to discriminatively localize different sounding objects but quite challenging for machines to achieve class-aware sounding objects localization without…

Computer Vision and Pattern Recognition · Computer Science 2021-12-23 Di Hu , Yake Wei , Rui Qian , Weiyao Lin , Ruihua Song , Ji-Rong Wen

The audio-visual segmentation (AVS) task aims to segment sounding objects from a given video. Existing works mainly focus on fusing audio and visual features of a given video to achieve sounding object masks. However, we observed that prior…

Sound · Computer Science 2023-08-02 Chen Liu , Peike Li , Xingqun Qi , Hu Zhang , Lincheng Li , Dadong Wang , Xin Yu

The task of Visual Sound Source Localization (VSSL) involves identifying the location of sound sources in visual scenes, integrating audio-visual data for enhanced scene understanding. Despite advancements in state-of-the-art (SOTA) models,…

Computer Vision and Pattern Recognition · Computer Science 2025-01-14 Xavier Juanola , Gloria Haro , Magdalena Fuentes

The goal of the multi-sound source localization task is to localize sound sources from the mixture individually. While recent multi-sound source localization methods have shown improved performance, they face challenges due to their…

Computer Vision and Pattern Recognition · Computer Science 2024-04-04 Dongjin Kim , Sung Jin Um , Sangmin Lee , Jung Uk Kim

Visual sound source localization is a fundamental perception task that aims to detect the location of sounding sources in a video given its audio. Despite recent progress, we identify two shortcomings in current methods: 1) most approaches…

Computer Vision and Pattern Recognition · Computer Science 2025-09-01 Xavier Juanola , Giovana Morais , Magdalena Fuentes , Gloria Haro

The goal of Multilingual Visual Answer Localization (MVAL) is to locate a video segment that answers a given multilingual question. Existing methods either focus solely on visual modality or integrate visual and subtitle modalities.…

Multimedia · Computer Science 2024-11-06 Zhibin Wen , Bin Li

Learning how to localize and separate individual object sounds in the audio channel of the video is a difficult task. Current state-of-the-art methods predict audio masks from artificially mixed spectrograms, known as Mix-and-Separate…

Computer Vision and Pattern Recognition · Computer Science 2021-04-07 Tanzila Rahman , Leonid Sigal

Visual events are usually accompanied by sounds in our daily lives. However, can the machines learn to correlate the visual scene and sound, as well as localize the sound source only by observing them like humans? To investigate its…

Computer Vision and Pattern Recognition · Computer Science 2019-11-22 Arda Senocak , Tae-Hyun Oh , Junsik Kim , Ming-Hsuan Yang , In So Kweon

Audio-Visual Segmentation (AVS) aims to identify and segment sound-producing objects in videos by leveraging both visual and audio modalities. It has emerged as a significant research area in multimodal perception, enabling fine-grained…

Computer Vision and Pattern Recognition · Computer Science 2025-08-07 Jia Li , Yapeng Tian

Humans can easily perceive the direction of sound sources in a visual scene, termed sound source localization. Recent studies on learning-based sound source localization have mainly explored the problem from a localization perspective.…

Computer Vision and Pattern Recognition · Computer Science 2023-09-20 Arda Senocak , Hyeonggon Ryu , Junsik Kim , Tae-Hyun Oh , Hanspeter Pfister , Joon Son Chung

The objective of this work is to localize sound sources that are visible in a video without using manual annotations. Our key technical contribution is to show that, by training the network to explicitly discriminate challenging image…

Computer Vision and Pattern Recognition · Computer Science 2021-04-07 Honglie Chen , Weidi Xie , Triantafyllos Afouras , Arsha Nagrani , Andrea Vedaldi , Andrew Zisserman

Recognizing the sounding objects in scenes is a longstanding objective in embodied AI, with diverse applications in robotics and AR/VR/MR. To that end, Audio-Visual Segmentation (AVS), taking as condition an audio signal to identify the…

Computer Vision and Pattern Recognition · Computer Science 2025-10-22 Artem Sokolov , Swapnil Bhosale , Xiatian Zhu

Audio-visual sound source localization (AV-SSL) estimates the position of sound sources by fusing auditory and visual cues. Current AV-SSL methodologies typically require spatially-paired audio-visual data and cannot selectively localize…

Sound · Computer Science 2025-08-07 Yu Chen , Hongxu Zhu , Jiadong Wang , Kainan Chen , Xinyuan Qian
‹ Prev 1 2 3 10 Next ›