English
Related papers

Related papers: Object-Based Audio Rendering

200 papers

The current paradigm for creating and deploying immersive audio content is based on audio objects, which are composed of an audio track and position metadata. While rendering an object-based production into a multichannel mix is…

Sound · Computer Science 2021-12-22 Daniel Arteaga , Jordi Pons

Generating accurate sounds for complex audio-visual scenes is challenging, especially in the presence of multiple objects and sound sources. In this paper, we propose an {\em interactive object-aware audio generation} model that grounds…

Computer Vision and Pattern Recognition · Computer Science 2025-06-05 Tingle Li , Baihe Huang , Xiaobin Zhuang , Dongya Jia , Jiawei Chen , Yuping Wang , Zhuo Chen , Gopala Anumanchipalli , Yuxuan Wang

Object referring has important applications, especially for human-machine interaction. While having received great attention, the task is mainly attacked with written language (text) as input rather than spoken language (speech), which is…

Computer Vision and Pattern Recognition · Computer Science 2017-12-06 Arun Balajee Vasudevan , Dengxin Dai , Luc Van Gool

We present a new method to capture the acoustic characteristics of real-world rooms using commodity devices, and use the captured characteristics to generate similar sounding sources with virtual models. Given the captured audio and an…

Sound · Computer Science 2021-09-28 Zhenyu Tang , Nicholas J. Bryan , Dingzeyu Li , Timothy R. Langlois , Dinesh Manocha

Audiovisual scenes are pervasive in our daily life. It is commonplace for humans to discriminatively localize different sounding objects but quite challenging for machines to achieve class-aware sounding objects localization without…

Computer Vision and Pattern Recognition · Computer Science 2021-12-23 Di Hu , Yake Wei , Rui Qian , Weiyao Lin , Ruihua Song , Ji-Rong Wen

Audio captioning aims to generate text descriptions of audio clips. In the real world, many objects produce similar sounds. How to accurately recognize ambiguous sounds is a major challenge for audio captioning. In this work, inspired by…

Audio and Speech Processing · Electrical Eng. & Systems 2023-05-30 Xubo Liu , Qiushi Huang , Xinhao Mei , Haohe Liu , Qiuqiang Kong , Jianyuan Sun , Shengchen Li , Tom Ko , Yu Zhang , Lilian H. Tang , Mark D. Plumbley , Volkan Kılıç , Wenwu Wang

In this work, we develop a multi-modal rendering framework comprising of hapto-visual and auditory data. The prime focus is to haptically render point cloud data representing virtual 3-D models of cultural significance and also to handle…

Perceiving a scene most fully requires all the senses. Yet modeling how objects look and sound is challenging: most natural scenes and events contain multiple objects, and the audio track mixes all the sound sources together. We propose to…

Computer Vision and Pattern Recognition · Computer Science 2018-07-27 Ruohan Gao , Rogerio Feris , Kristen Grauman

Interactive audio spatialization technology previously developed for video game authoring and rendering has evolved into an essential component of platforms enabling shared immersive virtual experiences for future co-presence, remote…

Sound · Computer Science 2021-09-28 Jean-Marc Jot , Rémi Audfray , Mark Hertensteiner , Brian Schmidt

Object Based Audio (OBA) provides a new kind of audio experience, delivered to the audience to personalize and customize their experience of listening and to give them choice of what and how to hear their audio content. OBA can be applied…

Audio and Speech Processing · Electrical Eng. & Systems 2023-10-31 Mohammad Reza Hasanabadi

Grounding objects in images using visual cues is a well-established approach in computer vision, yet the potential of audio as a modality for object recognition and grounding remains underexplored. We introduce YOSS, "You Only Speak Once to…

Computer Vision and Pattern Recognition · Computer Science 2024-10-01 Wenhao Yang , Jianguo Wei , Wenhuan Lu , Lei Li

In this paper our objectives are, first, networks that can embed audio and visual inputs into a common space that is suitable for cross-modal retrieval; and second, a network that can localize the object that sounds in an image, given the…

Computer Vision and Pattern Recognition · Computer Science 2018-07-27 Relja Arandjelović , Andrew Zisserman

Advances in object tracking and acoustic beamforming are driving new capabilities in surveillance, human-computer interaction, and robotics. This work presents an embedded system that integrates deep learning-based tracking with beamforming…

Visual objects often have acoustic signatures that are naturally synchronized with them in audio-bearing video recordings. For this project, we explore the multimodal feature aggregation for video instance segmentation task, in which we…

Computer Vision and Pattern Recognition · Computer Science 2023-01-26 Kaihui Zheng , Yuqing Ren , Zixin Shen , Tianxu Qin

Audio-visual sound source localization task aims to spatially localize sound-making objects within visual scenes by integrating visual and audio cues. However, existing methods struggle with accurately localizing sound-making objects in…

Computer Vision and Pattern Recognition · Computer Science 2025-06-25 Sung Jin Um , Dongjin Kim , Sangmin Lee , Jung Uk Kim

Objects make unique sounds under different perturbations, environment conditions, and poses relative to the listener. While prior works have modeled impact sounds and sound propagation in simulation, we lack a standard dataset of impact…

Sound · Computer Science 2023-06-19 Samuel Clarke , Ruohan Gao , Mason Wang , Mark Rau , Julia Xu , Jui-Hsien Wang , Doug L. James , Jiajun Wu

Machine learning techniques have proved useful for classifying and analyzing audio content. However, recent methods typically rely on abstract and high-dimensional representations that are difficult to interpret. Inspired by…

Humans can robustly recognize and localize objects by integrating visual and auditory cues. While machines are able to do the same now with images, less work has been done with sounds. This work develops an approach for dense semantic…

Computer Vision and Pattern Recognition · Computer Science 2020-03-10 Arun Balajee Vasudevan , Dengxin Dai , Luc Van Gool

How does audio describe the world around us? In this paper, we propose a method for generating an image of a scene from sound. Our method addresses the challenges of dealing with the large gaps that often exist between sight and sound. We…

Computer Vision and Pattern Recognition · Computer Science 2023-03-31 Kim Sung-Bin , Arda Senocak , Hyunwoo Ha , Andrew Owens , Tae-Hyun Oh

Humans can robustly recognize and localize objects by using visual and/or auditory cues. While machines are able to do the same with visual data already, less work has been done with sounds. This work develops an approach for scene…

Sound · Computer Science 2022-03-01 Dengxin Dai , Arun Balajee Vasudevan , Jiri Matas , Luc Van Gool
‹ Prev 1 2 3 10 Next ›