Related papers: FlowGrad: Using Motion for Visual Sound Source Loc…

Hear The Flow: Optical Flow-Based Self-Supervised Visual Sound Source Localization

Learning to localize the sound source in videos without explicit annotations is a novel area of audio-visual research. Existing work in this area focuses on creating attention maps to capture the correlation between the two modalities to…

Computer Vision and Pattern Recognition · Computer Science 2022-11-08 Dennis Fedorishin , Deen Dayal Mohan , Bhavin Jawade , Srirangaraj Setlur , Venu Govindaraju

Do We Need Sound for Sound Source Localization?

During the performance of sound source localization which uses both visual and aural information, it presently remains unclear how much either image or sound modalities contribute to the result, i.e. do we need both image and sound for…

Computer Vision and Pattern Recognition · Computer Science 2020-07-14 Takashi Oya , Shohei Iwase , Ryota Natsume , Takahiro Itazuri , Shugo Yamaguchi , Shigeo Morishima

How to Listen? Rethinking Visual Sound Localization

Localizing visual sounds consists on locating the position of objects that emit sound within an image. It is a growing research area with potential applications in monitoring natural and urban environments, such as wildlife migration and…

Sound · Computer Science 2022-04-12 Ho-Hsiang Wu , Magdalena Fuentes , Prem Seetharaman , Juan Pablo Bello

Learning to Localize Sound Source in Visual Scenes

Visual events are usually accompanied by sounds in our daily lives. We pose the question: Can the machine learn the correspondence between visual scene and the sound, and localize the sound source only by observing sound and visual scene…

Computer Vision and Pattern Recognition · Computer Science 2019-02-18 Arda Senocak , Tae-Hyun Oh , Junsik Kim , Ming-Hsuan Yang , In So Kweon

Learning from Silence and Noise for Visual Sound Source Localization

Visual sound source localization is a fundamental perception task that aims to detect the location of sounding sources in a video given its audio. Despite recent progress, we identify two shortcomings in current methods: 1) most approaches…

Computer Vision and Pattern Recognition · Computer Science 2025-09-01 Xavier Juanola , Giovana Morais , Magdalena Fuentes , Gloria Haro

Sound Source Localization is All about Cross-Modal Alignment

Humans can easily perceive the direction of sound sources in a visual scene, termed sound source localization. Recent studies on learning-based sound source localization have mainly explored the problem from a localization perspective.…

Computer Vision and Pattern Recognition · Computer Science 2023-09-20 Arda Senocak , Hyeonggon Ryu , Junsik Kim , Tae-Hyun Oh , Hanspeter Pfister , Joon Son Chung

Visually Guided Sound Source Separation and Localization using Self-Supervised Motion Representations

The objective of this paper is to perform audio-visual sound source separation, i.e.~to separate component audios from a mixture based on the videos of sound sources. Moreover, we aim to pinpoint the source location in the input video…

Computer Vision and Pattern Recognition · Computer Science 2021-04-20 Lingyu Zhu , Esa Rahtu

Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications

Visual events are usually accompanied by sounds in our daily lives. However, can the machines learn to correlate the visual scene and sound, as well as localize the sound source only by observing them like humans? To investigate its…

Computer Vision and Pattern Recognition · Computer Science 2019-11-22 Arda Senocak , Tae-Hyun Oh , Junsik Kim , Ming-Hsuan Yang , In So Kweon

Audio Simulation for Sound Source Localization in Virtual Evironment

Non-line-of-sight localization in signal-deprived environments is a challenging yet pertinent problem. Acoustic methods in such predominantly indoor scenarios encounter difficulty due to the reverberant nature. In this study, we aim to…

Machine Learning · Computer Science 2024-04-03 Yi Di Yuan , Swee Liang Wong , Jonathan Pan

Object-aware Sound Source Localization via Audio-Visual Scene Understanding

Audio-visual sound source localization task aims to spatially localize sound-making objects within visual scenes by integrating visual and audio cues. However, existing methods struggle with accurately localizing sound-making objects in…

Computer Vision and Pattern Recognition · Computer Science 2025-06-25 Sung Jin Um , Dongjin Kim , Sangmin Lee , Jung Uk Kim

A Closer Look at Weakly-Supervised Audio-Visual Source Localization

Audio-visual source localization is a challenging task that aims to predict the location of visual sound sources in a video. Since collecting ground-truth annotations of sounding objects can be costly, a plethora of weakly-supervised…

Sound · Computer Science 2022-09-21 Shentong Mo , Pedro Morgado

Localizing Visual Sounds the Easy Way

Unsupervised audio-visual source localization aims at localizing visible sound sources in a video without relying on ground-truth localization for training. Previous works often seek high audio-visual similarities for likely positive…

Computer Vision and Pattern Recognition · Computer Science 2022-03-30 Shentong Mo , Pedro Morgado

Audio-Visual Grouping Network for Sound Localization from Mixtures

Sound source localization is a typical and challenging task that predicts the location of sound sources in a video. Previous single-source methods mainly used the audio-visual association as clues to localize sounding objects in each image.…

Computer Vision and Pattern Recognition · Computer Science 2023-03-31 Shentong Mo , Yapeng Tian

Multiple Sound Sources Localization from Coarse to Fine

How to visually localize multiple sound sources in unconstrained videos is a formidable problem, especially when lack of the pairwise sound-object annotations. To solve this problem, we develop a two-stage audiovisual learning framework…

Computer Vision and Pattern Recognition · Computer Science 2020-07-15 Rui Qian , Di Hu , Heinrich Dinkel , Mengyue Wu , Ning Xu , Weiyao Lin

Leveraging Category Information for Single-Frame Visual Sound Source Separation

Visual sound source separation aims at identifying sound components from a given sound mixture with the presence of visual cues. Prior works have demonstrated impressive results, but with the expense of large multi-stage architectures and…

Computer Vision and Pattern Recognition · Computer Science 2021-04-19 Lingyu Zhu , Esa Rahtu

Sound Source Localization for Spatial Mapping of Surgical Actions in Dynamic Scenes

Purpose: Surgical scene understanding is key to advancing computer-aided and intelligent surgical systems. Current approaches predominantly rely on visual data or end-to-end learning, which limits fine-grained contextual modeling. This work…

Sound · Computer Science 2026-05-05 Jonas Hein , Lazaros Vlachopoulos , Maurits Geert Laurent Olthof , Bastian Sigrist , Philipp Fürnstahl , Matthias Seibold

Localizing Visual Sounds the Hard Way

The objective of this work is to localize sound sources that are visible in a video without using manual annotations. Our key technical contribution is to show that, by training the network to explicitly discriminate challenging image…

Computer Vision and Pattern Recognition · Computer Science 2021-04-07 Honglie Chen , Weidi Xie , Triantafyllos Afouras , Arsha Nagrani , Andrea Vedaldi , Andrew Zisserman

Identify, locate and separate: Audio-visual object extraction in large video collections using weak supervision

We tackle the problem of audiovisual scene analysis for weakly-labeled data. To this end, we build upon our previous audiovisual representation learning framework to perform object classification in noisy acoustic environments and integrate…

Computer Vision and Pattern Recognition · Computer Science 2018-11-12 Sanjeel Parekh , Alexey Ozerov , Slim Essid , Ngoc Duong , Patrick Pérez , Gaël Richard

T-VSL: Text-Guided Visual Sound Source Localization in Mixtures

Visual sound source localization poses a significant challenge in identifying the semantic region of each sounding source within a video. Existing self-supervised and weakly supervised source localization methods struggle to accurately…

Computer Vision and Pattern Recognition · Computer Science 2024-07-09 Tanvir Mahmud , Yapeng Tian , Diana Marculescu

Class-aware Sounding Objects Localization via Audiovisual Correspondence

Audiovisual scenes are pervasive in our daily life. It is commonplace for humans to discriminatively localize different sounding objects but quite challenging for machines to achieve class-aware sounding objects localization without…

Computer Vision and Pattern Recognition · Computer Science 2021-12-23 Di Hu , Yake Wei , Rui Qian , Weiyao Lin , Ruihua Song , Ji-Rong Wen