Related papers: Object-Based Audio Rendering

Multichannel-based learning for audio object extraction

The current paradigm for creating and deploying immersive audio content is based on audio objects, which are composed of an audio track and position metadata. While rendering an object-based production into a multichannel mix is…

Sound · Computer Science 2021-12-22 Daniel Arteaga , Jordi Pons

Sounding that Object: Interactive Object-Aware Image to Audio Generation

Generating accurate sounds for complex audio-visual scenes is challenging, especially in the presence of multiple objects and sound sources. In this paper, we propose an {\em interactive object-aware audio generation} model that grounds…

Computer Vision and Pattern Recognition · Computer Science 2025-06-05 Tingle Li , Baihe Huang , Xiaobin Zhuang , Dongya Jia , Jiawei Chen , Yuping Wang , Zhuo Chen , Gopala Anumanchipalli , Yuxuan Wang

Object Referring in Visual Scene with Spoken Language

Object referring has important applications, especially for human-machine interaction. While having received great attention, the task is mainly attacked with written language (text) as input rather than spoken language (speech), which is…

Computer Vision and Pattern Recognition · Computer Science 2017-12-06 Arun Balajee Vasudevan , Dengxin Dai , Luc Van Gool

Scene-Aware Audio Rendering via Deep Acoustic Analysis

We present a new method to capture the acoustic characteristics of real-world rooms using commodity devices, and use the captured characteristics to generate similar sounding sources with virtual models. Given the captured audio and an…

Sound · Computer Science 2021-09-28 Zhenyu Tang , Nicholas J. Bryan , Dingzeyu Li , Timothy R. Langlois , Dinesh Manocha

Class-aware Sounding Objects Localization via Audiovisual Correspondence

Audiovisual scenes are pervasive in our daily life. It is commonplace for humans to discriminatively localize different sounding objects but quite challenging for machines to achieve class-aware sounding objects localization without…

Computer Vision and Pattern Recognition · Computer Science 2021-12-23 Di Hu , Yake Wei , Rui Qian , Weiyao Lin , Ruihua Song , Ji-Rong Wen

Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention

Audio captioning aims to generate text descriptions of audio clips. In the real world, many objects produce similar sounds. How to accurately recognize ambiguous sounds is a major challenge for audio captioning. In this work, inspired by…

Audio and Speech Processing · Electrical Eng. & Systems 2023-05-30 Xubo Liu , Qiushi Huang , Xinhao Mei , Haohe Liu , Qiuqiang Kong , Jianyuan Sun , Shengchen Li , Tom Ko , Yu Zhang , Lilian H. Tang , Mark D. Plumbley , Volkan Kılıç , Wenwu Wang

Combined Hapto-Visual and Auditory Rendering of Cultural Heritage Objects

In this work, we develop a multi-modal rendering framework comprising of hapto-visual and auditory data. The prime focus is to haptically render point cloud data representing virtual 3-D models of cultural significance and also to handle…

Multimedia · Computer Science 2020-10-06 Praseedha Krishnan Aniyath , Sreeni Kamalalayam Gopalan , Priyadarshini K , Subhasis Chaudhuri

Learning to Separate Object Sounds by Watching Unlabeled Video

Perceiving a scene most fully requires all the senses. Yet modeling how objects look and sound is challenging: most natural scenes and events contain multiple objects, and the audio track mixes all the sound sources together. We propose to…

Computer Vision and Pattern Recognition · Computer Science 2018-07-27 Ruohan Gao , Rogerio Feris , Kristen Grauman

Rendering Spatial Sound for Interoperable Experiences in the Audio Metaverse

Interactive audio spatialization technology previously developed for video game authoring and rendering has evolved into an essential component of platforms enabling shared immersive virtual experiences for future co-presence, remote…

Sound · Computer Science 2021-09-28 Jean-Marc Jot , Rémi Audfray , Mark Hertensteiner , Brian Schmidt

A Novel Approach for Object Based Audio Broadcasting

Object Based Audio (OBA) provides a new kind of audio experience, delivered to the audience to personalize and customize their experience of listening and to give them choice of what and how to hear their audio content. OBA can be applied…

Audio and Speech Processing · Electrical Eng. & Systems 2023-10-31 Mohammad Reza Hasanabadi

You Only Speak Once to See

Grounding objects in images using visual cues is a well-established approach in computer vision, yet the potential of audio as a modality for object recognition and grounding remains underexplored. We introduce YOSS, "You Only Speak Once to…

Computer Vision and Pattern Recognition · Computer Science 2024-10-01 Wenhao Yang , Jianguo Wei , Wenhuan Lu , Lei Li

Objects that Sound

In this paper our objectives are, first, networks that can embed audio and visual inputs into a common space that is suitable for cross-modal retrieval; and second, a network that can localize the object that sounds in an image, given the…

Computer Vision and Pattern Recognition · Computer Science 2018-07-27 Relja Arandjelović , Andrew Zisserman

Real-Time Object Tracking with On-Device Deep Learning for Adaptive Beamforming in Dynamic Acoustic Environments

Advances in object tracking and acoustic beamforming are driving new capabilities in surveillance, human-computer interaction, and robotics. This work presents an embedded system that integrates deep learning-based tracking with beamforming…

Sound · Computer Science 2025-11-25 Jorge Ortigoso-Narro , Jose A. Belloch , Adrian Amor-Martin , Sandra Roger , Maximo Cobos

Object Segmentation with Audio Context

Visual objects often have acoustic signatures that are naturally synchronized with them in audio-bearing video recordings. For this project, we explore the multimodal feature aggregation for video instance segmentation task, in which we…

Computer Vision and Pattern Recognition · Computer Science 2023-01-26 Kaihui Zheng , Yuqing Ren , Zixin Shen , Tianxu Qin

Object-aware Sound Source Localization via Audio-Visual Scene Understanding

Audio-visual sound source localization task aims to spatially localize sound-making objects within visual scenes by integrating visual and audio cues. However, existing methods struggle with accurately localizing sound-making objects in…

Computer Vision and Pattern Recognition · Computer Science 2025-06-25 Sung Jin Um , Dongjin Kim , Sangmin Lee , Jung Uk Kim

RealImpact: A Dataset of Impact Sound Fields for Real Objects

Objects make unique sounds under different perturbations, environment conditions, and poses relative to the listener. While prior works have modeled impact sounds and sound propagation in simulation, we lack a standard dataset of impact…

Sound · Computer Science 2023-06-19 Samuel Clarke , Ruohan Gao , Mason Wang , Mark Rau , Julia Xu , Jui-Hsien Wang , Doug L. James , Jiajun Wu

A Model You Can Hear: Audio Identification with Playable Prototypes

Machine learning techniques have proved useful for classifying and analyzing audio content. However, recent methods typically rely on abstract and high-dimensional representations that are difficult to interpret. Inspired by…

Sound · Computer Science 2022-08-08 Romain Loiseau , Baptiste Bouvier , Yann Teytaut , Elliot Vincent , Mathieu Aubry , Loic Landrieu

Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds

Humans can robustly recognize and localize objects by integrating visual and auditory cues. While machines are able to do the same now with images, less work has been done with sounds. This work develops an approach for dense semantic…

Computer Vision and Pattern Recognition · Computer Science 2020-03-10 Arun Balajee Vasudevan , Dengxin Dai , Luc Van Gool

Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment

How does audio describe the world around us? In this paper, we propose a method for generating an image of a scene from sound. Our method addresses the challenges of dealing with the large gaps that often exist between sight and sound. We…

Computer Vision and Pattern Recognition · Computer Science 2023-03-31 Kim Sung-Bin , Arda Senocak , Hyunwoo Ha , Andrew Owens , Tae-Hyun Oh

Binaural SoundNet: Predicting Semantics, Depth and Motion with Binaural Sounds

Humans can robustly recognize and localize objects by using visual and/or auditory cues. While machines are able to do the same with visual data already, less work has been done with sounds. This work develops an approach for scene…

Sound · Computer Science 2022-03-01 Dengxin Dai , Arun Balajee Vasudevan , Jiri Matas , Luc Van Gool