Related papers: DiffAU: Diffusion-Based Ambisonics Upscaling

Diff-SAGe: End-to-End Spatial Audio Generation Using Diffusion Models

Spatial audio is a crucial component in creating immersive experiences. Traditional simulation-based approaches to generate spatial audio rely on expertise, have limited scalability, and assume independence between semantic and spatial…

Sound · Computer Science 2025-07-16 Saksham Singh Kushwaha , Jianbo Ma , Mark R. P. Thomas , Yapeng Tian , Avery Bruni

Ambisonics Super-Resolution Using A Waveform-Domain Neural Network

Ambisonics is a spatial audio format describing a sound field. First-order Ambisonics (FOA) is a popular format comprising only four channels. This limited channel count comes at the expense of spatial accuracy. Ideally one would be able to…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-04 Ismael Nawfal , Symeon Delikaris Manias , Mehrez Souden , Juha Merimaa , Joshua Atkins , Elisabeth McMullin , Shadi Pirhosseinloo , Daniel Phillips

ImmerseDiffusion: A Generative Spatial Audio Latent Diffusion Model

We introduce ImmerseDiffusion, an end-to-end generative audio model that produces 3D immersive soundscapes conditioned on the spatial, temporal, and environmental conditions of sound objects. ImmerseDiffusion is trained to generate…

Sound · Computer Science 2025-02-11 Mojtaba Heydari , Mehrez Souden , Bruno Conejo , Joshua Atkins

SIRUP: A diffusion-based virtual upmixer of steering vectors for highly-directive spatialization with first-order ambisonics

This paper presents virtual upmixing of steering vectors captured by a fewer-channel spherical microphone array. This challenge has conventionally been addressed by recovering the directions and signals of sound sources from first-order…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-23 Emilio Picard , Diego Di Carlo , Aditya Arie Nugraha , Mathieu Fontaine , Kazuyoshi Yoshii

DynFOA: Generating First-Order Ambisonics with Conditional Diffusion for Dynamic and Acoustically Complex 360-Degree Videos

Spatial audio is crucial for immersive 360-degree video experiences, yet most 360-degree videos lack it due to the difficulty of capturing spatial audio during recording. Automatically generating spatial audio such as first-order ambisonics…

Sound · Computer Science 2026-04-13 Ziyu Luo , Lin Chen , Qiang Qu , Xiaoming Chen , Yiran Shen

DynFOA: Generating First-Order Ambisonics with Conditional Diffusion for Dynamic and Acoustically Complex 360-Degree Videos

Spatial audio is crucial for immersive 360-degree video experiences, yet most 360-degree videos lack it due to the difficulty of capturing spatial audio during recording. Automatically generating spatial audio such as first-order ambisonics…

Sound · Computer Science 2026-05-05 Ziyu Luo , Lin Chen , Qiang Qu , Xiaoming Chen , Yiran Shen

Methodology for 3D sound synthesis of directional acoustic sources by higher-order ambisonics

This paper presents the 3D soundfield synthesis of the pressure field radiated by directional acoustic sources using both the multimodal method and higher-order ambisonics (HOA). Ambisonics is a technique for encoding and reproducing…

Classical Physics · Physics 2024-09-20 Philippe Thorner , Eric Bavu , Jean-Baptiste Doc , Christophe Langrenne

First Order Ambisonics Domain Spatial Augmentation for DNN-based Direction of Arrival Estimation

In this paper, we propose a novel data augmentation method for training neural networks for Direction of Arrival (DOA) estimation. This method focuses on expanding the representation of the DOA subspace of a dataset. Given some input data,…

Audio and Speech Processing · Electrical Eng. & Systems 2019-10-11 Luca Mazzon , Yuma Koizumi , Masahiro Yasuda , Noboru Harada

A first-order DirAC-based parametric Ambisonic coder for immersive communications

Directional Audio Coding (DirAC) is a proven method for parametrically representing a 3D audio scene in B-format and is capable of reproducing it on arbitrary loudspeaker layouts. Although such a method seems well suited for low bitrate…

Audio and Speech Processing · Electrical Eng. & Systems 2025-04-01 Guillaume Fuchs , Florin Ghido , Dominik Weckbecker , Oliver Thiergart

Dual Quaternion Ambisonics Array for Six-Degree-of-Freedom Acoustic Representation

Spatial audio methods are gaining a growing interest due to the spread of immersive audio experiences and applications, such as virtual and augmented reality. For these purposes, 3D audio signals are often acquired through arrays of…

Audio and Speech Processing · Electrical Eng. & Systems 2022-12-16 Eleonora Grassucci , Gioia Mancini , Christian Brignone , Aurelio Uncini , Danilo Comminiello

Neural Ambisonic Encoding For Multi-Speaker Scenarios Using A Circular Microphone Array

Spatial audio formats like Ambisonics are playback device layout-agnostic and well-suited for applications such as teleconferencing and virtual reality. Conventional Ambisonic encoding methods often rely on spherical microphone arrays for…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-17 Yue Qiao , Vinay Kothapally , Meng Yu , Dong Yu

Perceptually-motivated Spatial Audio Codec for Higher-Order Ambisonics Compression

Scene-based spatial audio formats, such as Ambisonics, are playback system agnostic and may therefore be favoured for delivering immersive audio experiences to a wide range of (potentially unknown) devices. The number of channels required…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-25 Christoph Hold , Leo McCormack , Archontis Politis , Ville Pulkki

FOA Tokenizer: Low-bitrate Neural Codec for First Order Ambisonics with Spatial Consistency Loss

Neural audio codecs have been widely studied for mono and stereo signals, but spatial audio remains largely unexplored. We present the first discrete neural spatial audio codec for first-order ambisonics (FOA). Building on the WavTokenizer…

Sound · Computer Science 2025-10-28 Parthasaarathy Sudarsanam , Sebastian Braun , Hannes Gamper

Velocity Potential Neural Field for Efficient Ambisonics Impulse Response Modeling

First-order Ambisonics (FOA) is a standard spatial audio format based on spherical harmonic decomposition. Its zeroth- and first-order components capture the sound pressure and particle velocity, respectively. Recently, physics-informed…

Sound · Computer Science 2026-03-25 Yoshiki Masuyama , Francois G. Germain , Gordon Wichern , Chiori Hori , Jonathan Le Roux

Improving Spatial Resolution of First-order Ambisonics Using Sparse MDCT Representation

The paper presents a method for improving spatial resolution of first-order ambisonic audio. The method is based on time/frequency decomposition of the audio with subsequent extraction of a directed plane wave from each frequency component.…

Sound · Computer Science 2023-12-14 Denis Likhachov , Nick Petrovsky , Elias Azarov

Generating Moving 3D Soundscapes with Latent Diffusion Models

Spatial audio has become central to immersive applications such as VR/AR, cinema, and music. Existing generative audio models are largely limited to mono or stereo formats and cannot capture the full 3D localization cues available in…

Sound · Computer Science 2025-09-22 Christian Templin , Yanda Zhu , Hao Wang

HARP: A Large-Scale Higher-Order Ambisonic Room Impulse Response Dataset

This contribution introduces a dataset of 7th-order Ambisonic Room Impulse Responses (HOA-RIRs), created using the Image Source Method. By employing higher-order Ambisonics, our dataset enables precise spatial audio reproduction, a critical…

Sound · Computer Science 2025-06-02 Shivam Saini , Jürgen Peissig

Diff-GO: Diffusion Goal-Oriented Communications to Achieve Ultra-High Spectrum Efficiency

The latest advances in artificial intelligence (AI) present many unprecedented opportunities to achieve much improved bandwidth saving in communications. Unlike conventional communication systems focusing on packet transport, rich datasets…

Machine Learning · Computer Science 2023-12-07 Achintha Wijesinghe , Songyang Zhang , Suchinthaka Wanninayaka , Weiwei Wang , Zhi Ding

Perceptual Compensation of Ambisonics Recordings for Reproduction in Room

Ambisonics is a method for capturing and rendering a sound field accurately, assuming that the acoustics of the playback room does not significantly influence the sound field. However, in practice, the acoustics of the playback room may…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-14 Ali Fallah , Shun Nakamura , Steven van de Par

Dynamic Real-Time Ambisonics Order Adaptation for Immersive Networked Music Performances

Advanced remote applications such as Networked Music Performance (NMP) require solutions to guarantee immersive real-world-like interaction among users. Therefore, the adoption of spatial audio formats, such as Ambisonics, is fundamental to…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-04 Paolo Ostan , Carlo Centofanti , Mirco Pezzoli , Alberto Bernardini , Claudia Rinaldi , Fabio Antonacci