Related papers: Universal Spatial Audio Transcoder

DSpAST: Disentangled Representations for Spatial Audio Reasoning with Large Language Models

Reasoning about spatial audio with large language models requires a spatial audio encoder as an acoustic front-end to obtain audio embeddings for further processing. Such an encoder needs to capture all information required to detect the…

Audio and Speech Processing · Electrical Eng. & Systems 2025-11-04 Kevin Wilkinghoff , Zheng-Hua Tan

USAD: Universal Speech and Audio Representation via Distillation

Self-supervised learning (SSL) has revolutionized audio representations, yet models often remain domain-specific, focusing on either speech or non-speech tasks. In this work, we present Universal Speech and Audio Distillation (USAD), a…

Sound · Computer Science 2025-08-19 Heng-Jui Chang , Saurabhchand Bhati , James Glass , Alexander H. Liu

Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals

This paper proposes a new task called spatial voice conversion, which aims to convert a target voice while preserving spatial information and non-target signals. Traditional voice conversion methods focus on single-channel waveforms,…

Sound · Computer Science 2024-06-26 Kentaro Seki , Shinnosuke Takamichi , Norihiro Takamune , Yuki Saito , Kanami Imamura , Hiroshi Saruwatari

USat: A Unified Self-Supervised Encoder for Multi-Sensor Satellite Imagery

Large, self-supervised vision models have led to substantial advancements for automatically interpreting natural images. Recent works have begun tailoring these methods to remote sensing data which has rich structure with multi-sensor,…

Computer Vision and Pattern Recognition · Computer Science 2023-12-06 Jeremy Irvin , Lucas Tao , Joanne Zhou , Yuntao Ma , Langston Nashold , Benjamin Liu , Andrew Y. Ng

BAT: Learning to Reason about Spatial Sounds with Large Language Models

Spatial sound reasoning is a fundamental human skill, enabling us to navigate and interpret our surroundings based on sound. In this paper we present BAT, which combines the spatial sound perception ability of a binaural acoustic scene…

Audio and Speech Processing · Electrical Eng. & Systems 2025-05-20 Zhisheng Zheng , Puyuan Peng , Ziyang Ma , Xie Chen , Eunsol Choi , David Harwath

Self-supervised Audio Spatialization with Correspondence Classifier

Spatial audio is an essential medium to audiences for 3D visual and auditory experience. However, the recording devices and techniques are expensive or inaccessible to the general public. In this work, we propose a self-supervised audio…

Sound · Computer Science 2019-05-15 Yu-Ding Lu , Hsin-Ying Lee , Hung-Yu Tseng , Ming-Hsuan Yang

Quantifying Spatial Audio Quality Impairment

Spatial audio quality is a highly multifaceted concept, with many interactions between environmental, geometrical, anatomical, psychological, and contextual considerations. Methods for characterization or evaluation of the geometrical…

Audio and Speech Processing · Electrical Eng. & Systems 2024-08-27 Karn N. Watcharasupat , Alexander Lerch

AudioEar: Single-View Ear Reconstruction for Personalized Spatial Audio

Spatial audio, which focuses on immersive 3D sound rendering, is widely applied in the acoustic industry. One of the key problems of current spatial audio rendering methods is the lack of personalization based on different anatomies of…

Computer Vision and Pattern Recognition · Computer Science 2023-01-31 Xiaoyang Huang , Yanjun Wang , Yang Liu , Bingbing Ni , Wenjun Zhang , Jinxian Liu , Teng Li

SpatialCodec: Neural Spatial Speech Coding

In this work, we address the challenge of encoding speech captured by a microphone array using deep learning techniques with the aim of preserving and accurately reconstructing crucial spatial cues embedded in multi-channel recordings. We…

Sound · Computer Science 2024-07-10 Zhongweiyang Xu , Yong Xu , Vinay Kothapally , Heming Wang , Muqiao Yang , Dong Yu

SALM: Spatial Audio Language Model with Structured Embeddings for Understanding and Editing

Spatial audio understanding is essential for accurately perceiving and interpreting acoustic environments. However, existing audio-language models exhibit limitations in processing spatial audio and perceiving spatial acoustic scenes. To…

Sound · Computer Science 2025-09-19 Jinbo Hu , Yin Cao , Ming Wu , Zhenbo Luo , Jun Yang

Whisper-UT: A Unified Translation Framework for Speech and Text

Encoder-decoder models have achieved remarkable success in speech and text tasks, yet efficiently adapting these models to diverse uni/multi-modal scenarios remains an open challenge. In this paper, we propose Whisper-UT, a unified and…

Computation and Language · Computer Science 2025-09-23 Cihan Xiao , Matthew Wiesner , Debashish Chakraborty , Reno Kriz , Keith Cunningham , Kenton Murray , Kevin Duh , Luis Tavarez-Arce , Paul McNamee , Sanjeev Khudanpur

Holistic Exploration on Universal Decompositional Semantic Parsing: Architecture, Data Augmentation, and LLM Paradigm

In this paper, we conduct a holistic exploration of the Universal Decompositional Semantic (UDS) Parsing. We first introduce a cascade model for UDS parsing that decomposes the complex parsing task into semantically appropriate subtasks.…

Computation and Language · Computer Science 2023-07-26 Hexuan Deng , Xin Zhang , Meishan Zhang , Xuebo Liu , Min Zhang

Evaluation of spatial audio reproduction schemes for application in hearing aid research

Loudspeaker-based spatial audio reproduction schemes are increasingly used for evaluating hearing aids in complex acoustic conditions. To further establish the feasibility of this approach, this study investigated the interaction between…

Sound · Computer Science 2015-08-04 Giso Grimm , Stephan Ewert , Volker Hohmann

Assessment of sound spatialisation algorithms for sonic rendering with headsets

Given an input sound signal and a target virtual sound source, sound spatialisation algorithms manipulate the signal so that a listener perceives it as though it were emitted from the target source. There exist several established…

Sound · Computer Science 2017-11-28 Ali Tarzan , Marco Alunno , Paolo Bientinesi

Spacetime transformation acoustics

A recently proposed analogue transformation method has allowed the extension of transformation acoustics to general spacetime transformations. We analyze here in detail the differences between this new analogue transformation acoustics…

General Relativity and Quantum Cosmology · Physics 2014-07-09 C. García-Meca , S. Carloni , C. Barceló , G. Jannes , J. Sánchez-Dehesa , A. Martínez

Spatial Audio Motion Understanding and Reasoning

Spatial audio reasoning enables machines to interpret auditory scenes by understanding events and their spatial attributes. In this work, we focus on spatial audio understanding with an emphasis on reasoning about moving sources. First, we…

Sound · Computer Science 2025-09-19 Arvind Krishna Sridhar , Yinyi Guo , Erik Visser

Universal Sound Separation with Self-Supervised Audio Masked Autoencoder

Universal sound separation (USS) is a task of separating mixtures of arbitrary sound sources. Typically, universal separation models are trained from scratch in a supervised manner, using labeled data. Self-supervised learning (SSL) is an…

Audio and Speech Processing · Electrical Eng. & Systems 2024-11-07 Junqi Zhao , Xubo Liu , Jinzheng Zhao , Yi Yuan , Qiuqiang Kong , Mark D. Plumbley , Wenwu Wang

DeCodec: Rethinking Audio Codecs as Universal Disentangled Representation Learners

Universal audio codecs learn entangled representations across audio types, whereas some specific codecs offer decoupled representations but are limited to speech. Real-world audio, however, often contains mixed speech and background sounds,…

Sound · Computer Science 2025-09-12 Xiaoxue Luo , Jinwei Huang , Runyan Yang , Yingying Gao , Junlan Feng , Chao Deng , Shilei Zhang

Audio Prompt Tuning for Universal Sound Separation

Universal sound separation (USS) is a task to separate arbitrary sounds from an audio mixture. Existing USS systems are capable of separating arbitrary sources, given a few examples of the target sources as queries. However, separating…

Audio and Speech Processing · Electrical Eng. & Systems 2023-12-01 Yuzhuo Liu , Xubo Liu , Yan Zhao , Yuanyuan Wang , Rui Xia , Pingchuan Tain , Yuxuan Wang

VAST : The Virtual Acoustic Space Traveler Dataset

This paper introduces a new paradigm for sound source lo-calization referred to as virtual acoustic space traveling (VAST) and presents a first dataset designed for this purpose. Existing sound source localization methods are either based…

Sound · Computer Science 2016-12-20 Clément Gaultier , Saurabh Kataria , Antoine Deleforge