Related papers: Self-supervised Audio Spatialization with Correspo…

Exploiting Audio-Visual Consistency with Partial Supervision for Spatial Audio Generation

Human perceives rich auditory experience with distinct sound heard by ears. Videos recorded with binaural audio particular simulate how human receives ambient sound. However, a large number of videos are with monaural audio only, which…

Sound · Computer Science 2021-05-04 Yan-Bo Lin , Yu-Chiang Frank Wang

Self-Supervised Generation of Spatial Audio for 360 Video

We introduce an approach to convert mono audio recorded by a 360 video camera into spatial audio, a representation of the distribution of sound over the full viewing sphere. Spatial audio is an important component of immersive 360 video…

Sound · Computer Science 2018-09-10 Pedro Morgado , Nuno Vasconcelos , Timothy Langlois , Oliver Wang

Telling Left from Right: Learning Spatial Correspondence of Sight and Sound

Self-supervised audio-visual learning aims to capture useful representations of video by leveraging correspondences between visual and audio inputs. Existing approaches have focused primarily on matching semantic information between the…

Computer Vision and Pattern Recognition · Computer Science 2020-06-15 Karren Yang , Bryan Russell , Justin Salamon

Visual-based spatial audio generation system for multi-speaker environments

In multimedia applications such as films and video games, spatial audio techniques are widely employed to enhance user experiences by simulating 3D sound: transforming mono audio into binaural formats. However, this process is often complex…

Multimedia · Computer Science 2025-02-14 Xiaojing Liu , Ogulcan Gurelli , Yan Wang , Joshua Reiss

Assessment of sound spatialisation algorithms for sonic rendering with headsets

Given an input sound signal and a target virtual sound source, sound spatialisation algorithms manipulate the signal so that a listener perceives it as though it were emitted from the target source. There exist several established…

Sound · Computer Science 2017-11-28 Ali Tarzan , Marco Alunno , Paolo Bientinesi

ASAudio: A Survey of Advanced Spatial Audio Research

With the rapid development of spatial audio technologies today, applications in AR, VR, and other scenarios have garnered extensive attention. Unlike traditional mono sound, spatial audio offers a more realistic and immersive auditory…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-21 Zhiyuan Zhu , Yu Zhang , Wenxiang Guo , Changhao Pan , Zhou Zhao

Class-aware Sounding Objects Localization via Audiovisual Correspondence

Audiovisual scenes are pervasive in our daily life. It is commonplace for humans to discriminatively localize different sounding objects but quite challenging for machines to achieve class-aware sounding objects localization without…

Computer Vision and Pattern Recognition · Computer Science 2021-12-23 Di Hu , Yake Wei , Rui Qian , Weiyao Lin , Ruihua Song , Ji-Rong Wen

Self-supervised Neural Audio-Visual Sound Source Localization via Probabilistic Spatial Modeling

Detecting sound source objects within visual observation is important for autonomous robots to comprehend surrounding environments. Since sounding objects have a large variety with different appearances in our living environments, labeling…

Sound · Computer Science 2020-07-29 Yoshiki Masuyama , Yoshiaki Bando , Kohei Yatabe , Yoko Sasaki , Masaki Onishi , Yasuhiro Oikawa

Audio-Visual Spatial Integration and Recursive Attention for Robust Sound Source Localization

The objective of the sound source localization task is to enable machines to detect the location of sound-making objects within a visual scene. While the audio modality provides spatial cues to locate the sound source, existing approaches…

Multimedia · Computer Science 2023-08-21 Sung Jin Um , Dongjin Kim , Jung Uk Kim

Evaluation of spatial audio reproduction schemes for application in hearing aid research

Loudspeaker-based spatial audio reproduction schemes are increasingly used for evaluating hearing aids in complex acoustic conditions. To further establish the feasibility of this approach, this study investigated the interaction between…

Sound · Computer Science 2015-08-04 Giso Grimm , Stephan Ewert , Volker Hohmann

Self-supervised Learning of Audio Representations from Audio-Visual Data using Spatial Alignment

Learning from audio-visual data offers many possibilities to express correspondence between the audio and visual content, similar to the human perception that relates aural and visual information. In this work, we present a method for…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-23 Shanshan Wang , Archontis Politis , Annamaria Mesaros , Tuomas Virtanen

AudioEar: Single-View Ear Reconstruction for Personalized Spatial Audio

Spatial audio, which focuses on immersive 3D sound rendering, is widely applied in the acoustic industry. One of the key problems of current spatial audio rendering methods is the lack of personalization based on different anatomies of…

Computer Vision and Pattern Recognition · Computer Science 2023-01-31 Xiaoyang Huang , Yanjun Wang , Yang Liu , Bingbing Ni , Wenjun Zhang , Jinxian Liu , Teng Li

Quantifying Spatial Audio Quality Impairment

Spatial audio quality is a highly multifaceted concept, with many interactions between environmental, geometrical, anatomical, psychological, and contextual considerations. Methods for characterization or evaluation of the geometrical…

Audio and Speech Processing · Electrical Eng. & Systems 2024-08-27 Karn N. Watcharasupat , Alexander Lerch

Self-supervised Moving Vehicle Tracking with Stereo Sound

Humans are able to localize objects in the environment using both visual and auditory cues, integrating information from multiple modalities into a common reference frame. We introduce a system that can leverage unlabeled audio-visual data…

Computer Vision and Pattern Recognition · Computer Science 2019-10-28 Chuang Gan , Hang Zhao , Peihao Chen , David Cox , Antonio Torralba

Spatial mixup: Directional loudness modification as data augmentation for sound event localization and detection

Data augmentation methods have shown great importance in diverse supervised learning problems where labeled data is scarce or costly to obtain. For sound event localization and detection (SELD) tasks several augmentation methods have been…

Audio and Speech Processing · Electrical Eng. & Systems 2022-05-20 Ricardo Falcon-Perez , Kazuki Shimada , Yuichiro Koyama , Shusuke Takahashi , Yuki Mitsufuji

Unsupervised Sound Localization via Iterative Contrastive Learning

Sound localization aims to find the source of the audio signal in the visual scene. However, it is labor-intensive to annotate the correlations between the signals sampled from the audio and visual modalities, thus making it difficult to…

Computer Vision and Pattern Recognition · Computer Science 2021-04-02 Yan-Bo Lin , Hung-Yu Tseng , Hsin-Ying Lee , Yen-Yu Lin , Ming-Hsuan Yang

Self-Supervised Visual Acoustic Matching

Acoustic matching aims to re-synthesize an audio clip to sound as if it were recorded in a target acoustic environment. Existing methods assume access to paired training data, where the audio is observed in both source and target…

Multimedia · Computer Science 2023-11-27 Arjun Somayazulu , Changan Chen , Kristen Grauman

Space-Time Memory Network for Sounding Object Localization in Videos

Leveraging temporal synchronization and association within sight and sound is an essential step towards robust localization of sounding objects. To this end, we propose a space-time memory network for sounding object localization in videos.…

Computer Vision and Pattern Recognition · Computer Science 2021-11-11 Sizhe Li , Yapeng Tian , Chenliang Xu

Spatial-Magnifier: Spatial upsampling for multichannel speech enhancement

While the spatial directivity of multichannel speech enhancement algorithms improves with the number of microphones, fitting large capture arrays into real-world edge devices is typically limited by physical constraints. To overcome this…

Audio and Speech Processing · Electrical Eng. & Systems 2026-05-08 Dongheon Lee , Ashutosh Pandey , Sanjeel Parekh , Daniel Wong , Jacob Donley , Buye Xu , Juan Azcarreta

Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds

Humans can robustly recognize and localize objects by integrating visual and auditory cues. While machines are able to do the same now with images, less work has been done with sounds. This work develops an approach for dense semantic…

Computer Vision and Pattern Recognition · Computer Science 2020-03-10 Arun Balajee Vasudevan , Dengxin Dai , Luc Van Gool