English
Related papers

Related papers: Universal Spatial Audio Transcoder

200 papers

Reasoning about spatial audio with large language models requires a spatial audio encoder as an acoustic front-end to obtain audio embeddings for further processing. Such an encoder needs to capture all information required to detect the…

Audio and Speech Processing · Electrical Eng. & Systems 2025-11-04 Kevin Wilkinghoff , Zheng-Hua Tan

Self-supervised learning (SSL) has revolutionized audio representations, yet models often remain domain-specific, focusing on either speech or non-speech tasks. In this work, we present Universal Speech and Audio Distillation (USAD), a…

Sound · Computer Science 2025-08-19 Heng-Jui Chang , Saurabhchand Bhati , James Glass , Alexander H. Liu

This paper proposes a new task called spatial voice conversion, which aims to convert a target voice while preserving spatial information and non-target signals. Traditional voice conversion methods focus on single-channel waveforms,…

Large, self-supervised vision models have led to substantial advancements for automatically interpreting natural images. Recent works have begun tailoring these methods to remote sensing data which has rich structure with multi-sensor,…

Computer Vision and Pattern Recognition · Computer Science 2023-12-06 Jeremy Irvin , Lucas Tao , Joanne Zhou , Yuntao Ma , Langston Nashold , Benjamin Liu , Andrew Y. Ng

Spatial sound reasoning is a fundamental human skill, enabling us to navigate and interpret our surroundings based on sound. In this paper we present BAT, which combines the spatial sound perception ability of a binaural acoustic scene…

Audio and Speech Processing · Electrical Eng. & Systems 2025-05-20 Zhisheng Zheng , Puyuan Peng , Ziyang Ma , Xie Chen , Eunsol Choi , David Harwath

Spatial audio is an essential medium to audiences for 3D visual and auditory experience. However, the recording devices and techniques are expensive or inaccessible to the general public. In this work, we propose a self-supervised audio…

Sound · Computer Science 2019-05-15 Yu-Ding Lu , Hsin-Ying Lee , Hung-Yu Tseng , Ming-Hsuan Yang

Spatial audio quality is a highly multifaceted concept, with many interactions between environmental, geometrical, anatomical, psychological, and contextual considerations. Methods for characterization or evaluation of the geometrical…

Audio and Speech Processing · Electrical Eng. & Systems 2024-08-27 Karn N. Watcharasupat , Alexander Lerch

Spatial audio, which focuses on immersive 3D sound rendering, is widely applied in the acoustic industry. One of the key problems of current spatial audio rendering methods is the lack of personalization based on different anatomies of…

Computer Vision and Pattern Recognition · Computer Science 2023-01-31 Xiaoyang Huang , Yanjun Wang , Yang Liu , Bingbing Ni , Wenjun Zhang , Jinxian Liu , Teng Li

In this work, we address the challenge of encoding speech captured by a microphone array using deep learning techniques with the aim of preserving and accurately reconstructing crucial spatial cues embedded in multi-channel recordings. We…

Sound · Computer Science 2024-07-10 Zhongweiyang Xu , Yong Xu , Vinay Kothapally , Heming Wang , Muqiao Yang , Dong Yu

Spatial audio understanding is essential for accurately perceiving and interpreting acoustic environments. However, existing audio-language models exhibit limitations in processing spatial audio and perceiving spatial acoustic scenes. To…

Sound · Computer Science 2025-09-19 Jinbo Hu , Yin Cao , Ming Wu , Zhenbo Luo , Jun Yang

Encoder-decoder models have achieved remarkable success in speech and text tasks, yet efficiently adapting these models to diverse uni/multi-modal scenarios remains an open challenge. In this paper, we propose Whisper-UT, a unified and…

In this paper, we conduct a holistic exploration of the Universal Decompositional Semantic (UDS) Parsing. We first introduce a cascade model for UDS parsing that decomposes the complex parsing task into semantically appropriate subtasks.…

Computation and Language · Computer Science 2023-07-26 Hexuan Deng , Xin Zhang , Meishan Zhang , Xuebo Liu , Min Zhang

Loudspeaker-based spatial audio reproduction schemes are increasingly used for evaluating hearing aids in complex acoustic conditions. To further establish the feasibility of this approach, this study investigated the interaction between…

Sound · Computer Science 2015-08-04 Giso Grimm , Stephan Ewert , Volker Hohmann

Given an input sound signal and a target virtual sound source, sound spatialisation algorithms manipulate the signal so that a listener perceives it as though it were emitted from the target source. There exist several established…

Sound · Computer Science 2017-11-28 Ali Tarzan , Marco Alunno , Paolo Bientinesi

A recently proposed analogue transformation method has allowed the extension of transformation acoustics to general spacetime transformations. We analyze here in detail the differences between this new analogue transformation acoustics…

General Relativity and Quantum Cosmology · Physics 2014-07-09 C. García-Meca , S. Carloni , C. Barceló , G. Jannes , J. Sánchez-Dehesa , A. Martínez

Spatial audio reasoning enables machines to interpret auditory scenes by understanding events and their spatial attributes. In this work, we focus on spatial audio understanding with an emphasis on reasoning about moving sources. First, we…

Sound · Computer Science 2025-09-19 Arvind Krishna Sridhar , Yinyi Guo , Erik Visser

Universal sound separation (USS) is a task of separating mixtures of arbitrary sound sources. Typically, universal separation models are trained from scratch in a supervised manner, using labeled data. Self-supervised learning (SSL) is an…

Audio and Speech Processing · Electrical Eng. & Systems 2024-11-07 Junqi Zhao , Xubo Liu , Jinzheng Zhao , Yi Yuan , Qiuqiang Kong , Mark D. Plumbley , Wenwu Wang

Universal audio codecs learn entangled representations across audio types, whereas some specific codecs offer decoupled representations but are limited to speech. Real-world audio, however, often contains mixed speech and background sounds,…

Sound · Computer Science 2025-09-12 Xiaoxue Luo , Jinwei Huang , Runyan Yang , Yingying Gao , Junlan Feng , Chao Deng , Shilei Zhang

Universal sound separation (USS) is a task to separate arbitrary sounds from an audio mixture. Existing USS systems are capable of separating arbitrary sources, given a few examples of the target sources as queries. However, separating…

Audio and Speech Processing · Electrical Eng. & Systems 2023-12-01 Yuzhuo Liu , Xubo Liu , Yan Zhao , Yuanyuan Wang , Rui Xia , Pingchuan Tain , Yuxuan Wang

This paper introduces a new paradigm for sound source lo-calization referred to as virtual acoustic space traveling (VAST) and presents a first dataset designed for this purpose. Existing sound source localization methods are either based…

Sound · Computer Science 2016-12-20 Clément Gaultier , Saurabh Kataria , Antoine Deleforge
‹ Prev 1 2 3 10 Next ›