English
Related papers

Related papers: SpatialCodec: Neural Spatial Speech Coding

200 papers

Spatial audio formats like Ambisonics are playback device layout-agnostic and well-suited for applications such as teleconferencing and virtual reality. Conventional Ambisonic encoding methods often rely on spherical microphone arrays for…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-17 Yue Qiao , Vinay Kothapally , Meng Yu , Dong Yu

Neural speech codecs have demonstrated their ability to compress high-quality speech and audio by converting them into discrete token representations. Most existing methods utilize Residual Vector Quantization (RVQ) to encode speech into…

Sound · Computer Science 2024-10-22 Peiji Yang , Fengping Wang , Yicheng Zhong , Huawei Wei , Zhisheng Wang

Neural speech coding is a rapidly developing topic, where state-of-the-art approaches now exhibit superior compression performance than conventional methods. Despite significant progress, existing methods still have limitations in…

Sound · Computer Science 2024-07-31 Youqiang Zheng , Weiping Tu , Li Xiao , Xinmeng Xu

Neural audio codecs have recently enabled high-fidelity reconstruction at high compression rates, especially for speech. However, speech and non-speech audio exhibit fundamentally different spectral characteristics: speech energy…

Audio and Speech Processing · Electrical Eng. & Systems 2025-11-11 Haoran Wang , Jiatong Shi , Jinchuan Tian , Bohan Li , Kai Yu , Shinji Watanabe

Supervised learning methods have shown effectiveness in estimating spatial acoustic parameters such as time difference of arrival, direct-to-reverberant ratio and reverberation time. However, they still suffer from the simulation-to-reality…

Sound · Computer Science 2024-09-10 Bing Yang , Xiaofei Li

Although fully end-to-end speaker diarization systems have made significant progress in recent years, modular systems often achieve superior results in real-world scenarios due to their greater adaptability and robustness. Historically,…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-26 Ruoyu Wang , Shutong Niu , Gaobin Yang , Jun Du , Shuangqing Qian , Tian Gao , Jia Pan

Neural audio codecs have revolutionized audio processing by enabling speech tasks to be performed on highly compressed representations. Recent work has shown that speech separation can be achieved within these compressed domains, offering…

Audio and Speech Processing · Electrical Eng. & Systems 2024-11-28 Jia Qi Yip , Chin Yuen Kwok , Bin Ma , Eng Siong Chng

Multi-channel speech enhancement extracts speech using multiple microphones that capture spatial cues. Effectively utilizing directional information is key for multi-channel enhancement. Deep learning shows great potential on multi-channel…

Sound · Computer Science 2023-09-21 Jiahui Pan , Pengjie Shen , Hui Zhang , Xueliang Zhang

Speech codecs serve as a crucial bridge in unifying speech and text language models. Existing codec methods face several challenges in semantic encoding, such as residual paralinguistic information (e.g., timbre, emotion), insufficient…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-06 Chunyu Qiang , Haoyu Wang , Cheng Gong , Tianrui Wang , Ruibo Fu , Tao Wang , Ruilong Chen , Jiangyan Yi , Zhengqi Wen , Chen Zhang , Longbiao Wang , Jianwu Dang , Jianhua Tao

This paper considers the joint compression and enhancement problem for speech signal in the presence of noise. Recently, the SoundStream codec, which relies on end-to-end joint training of an encoder-decoder pair and a residual vector…

Sound · Computer Science 2025-09-03 Jiayi Huang , Zeyu Yan , Wenbin Jiang , He Wang , Fei Wen

Large language models have revolutionized natural language processing through self-supervised pretraining on massive datasets. Inspired by this success, researchers have explored adapting these methods to speech by discretizing continuous…

Machine Learning · Computer Science 2025-10-28 Luca Della Libera , Francesco Paissan , Cem Subakan , Mirco Ravanelli

Neural audio codecs form the foundational building blocks for language model (LM)-based speech generation. Typically, there is a trade-off between frame rate and audio quality. This study introduces a low-frame-rate, semantically enhanced…

Sound · Computer Science 2025-10-02 Jiaqi Li , Xiaolong Lin , Zhekai Li , Shixi Huang , Yuancheng Wang , Chaoren Wang , Zhenpeng Zhan , Zhizheng Wu

Audio coding is an essential module in the real-time communication system. Neural audio codecs can compress audio samples with a low bitrate due to the strong modeling and generative capabilities of deep neural networks. To address the poor…

Sound · Computer Science 2023-10-18 Wenzhe Liu , Wei Xiao , Meng Wang , Shan Yang , Yupeng Shi , Yuyong Kang , Dan Su , Shidong Shang , Dong Yu

Neural Speech Codecs face a fundamental trade-off at low bitrates: preserving acoustic fidelity often compromises semantic richness. To address this, we introduce SACodec, a novel codec built upon an asymmetric dual-quantizer that employs…

Sound · Computer Science 2025-12-25 Zhongren Dong , Bin Wang , Jing Han , Haotian Guo , Xiaojun Mo , Yimin Cao , Zixing Zhang

This paper proposes a new task called spatial voice conversion, which aims to convert a target voice while preserving spatial information and non-target signals. Traditional voice conversion methods focus on single-channel waveforms,…

We introduce BANC, a neural binaural audio codec designed for efficient speech compression in single and two-speaker scenarios while preserving the spatial location information of each speaker. Our key contributions are as follows: 1) The…

Sound · Computer Science 2024-11-26 Anton Ratnarajah , Shi-Xiong Zhang , Dong Yu

Large language models (LLMs) have significantly advanced audio processing through audio codecs that convert audio into discrete tokens, enabling the application of language modelling techniques to audio data. However, traditional codecs…

Sound · Computer Science 2024-12-02 Haohe Liu , Xuenan Xu , Yi Yuan , Mengyue Wu , Wenwu Wang , Mark D. Plumbley

Universal audio codecs learn entangled representations across audio types, whereas some specific codecs offer decoupled representations but are limited to speech. Real-world audio, however, often contains mixed speech and background sounds,…

Sound · Computer Science 2025-09-12 Xiaoxue Luo , Jinwei Huang , Runyan Yang , Yingying Gao , Junlan Feng , Chao Deng , Shilei Zhang

Spatial audio quality is a highly multifaceted concept, with many interactions between environmental, geometrical, anatomical, psychological, and contextual considerations. Methods for characterization or evaluation of the geometrical…

Audio and Speech Processing · Electrical Eng. & Systems 2024-08-27 Karn N. Watcharasupat , Alexander Lerch

The advent of neural audio codecs has increased in popularity due to their potential for efficiently modeling audio with transformers. Such advanced codecs represent audio from a highly continuous waveform to low-sampled discrete units. In…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-19 Samir Sadok , Julien Hauret , Éric Bavu
‹ Prev 1 2 3 10 Next ›