Related papers: Source-Aware Neural Speech Coding for Noisy Speech…

SUNAC: Source-aware Unified Neural Audio Codec

Neural audio codecs (NACs) provide compact representations that can be leveraged in many downstream applications, in particular large language models. Yet most NACs encode mixtures of multiple sources in an entangled manner, which may…

Audio and Speech Processing · Electrical Eng. & Systems 2025-11-21 Ryo Aihara , Yoshiki Masuyama , Francesco Paissan , François G. Germain , Gordon Wichern , Jonathan Le Roux

Learning Source Disentanglement in Neural Audio Codec

Neural audio codecs have significantly advanced audio compression by efficiently converting continuous audio signals into discrete tokens. These codecs preserve high-quality sound and enable sophisticated sound generation through generative…

Sound · Computer Science 2025-02-12 Xiaoyu Bie , Xubo Liu , Gaël Richard

SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization

Speech codecs that convert continuous speech signals into discrete tokens have become essential for speech language models. However, existing codecs struggle to balance high-quality reconstruction with semantically rich representations,…

Audio and Speech Processing · Electrical Eng. & Systems 2025-12-16 Wenxi Chen , Xinsheng Wang , Ruiqi Yan , Yushen Chen , Zhikang Niu , Ziyang Ma , Xiquan Li , Yuzhe Liang , Hanlin Wen , Shunshun Yin , Ming Tao , Xie Chen

SACodec: Asymmetric Quantization with Semantic Anchoring for Low-Bitrate High-Fidelity Neural Speech Codecs

Neural Speech Codecs face a fundamental trade-off at low bitrates: preserving acoustic fidelity often compromises semantic richness. To address this, we introduce SACodec, a novel codec built upon an asymmetric dual-quantizer that employs…

Sound · Computer Science 2025-12-25 Zhongren Dong , Bin Wang , Jing Han , Haotian Guo , Xiaojun Mo , Yimin Cao , Zixing Zhang

Neural Joint Source-Channel Coding

For reliable transmission across a noisy communication channel, classical results from information theory show that it is asymptotically optimal to separate out the source and channel coding processes. However, this decomposition can fall…

Machine Learning · Computer Science 2019-05-15 Kristy Choi , Kedar Tatwawadi , Aditya Grover , Tsachy Weissman , Stefano Ermon

NESC: Robust Neural End-2-End Speech Coding with GANs

Neural networks have proven to be a formidable tool to tackle the problem of speech coding at very low bit rates. However, the design of a neural coder that can be operated robustly under real-world conditions remains a major challenge.…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-08 Nicola Pia , Kishan Gupta , Srikanth Korse , Markus Multrus , Guillaume Fuchs

Contextual Memory-Enhanced Source Coding for Low-SNR Communications

While Separate Source-Channel Coding (SSCC) retains the practical benefits of modular system design, its effectiveness in noisy text transmission is fundamentally constrained by the fragility of autoregressive source decoding. In low-SNR…

Information Theory · Computer Science 2026-05-08 Ziqiong Wang , Rongpeng Li

A High Fidelity and Low Complexity Neural Audio Coding

Audio coding is an essential module in the real-time communication system. Neural audio codecs can compress audio samples with a low bitrate due to the strong modeling and generative capabilities of deep neural networks. To address the poor…

Sound · Computer Science 2023-10-18 Wenzhe Liu , Wei Xiao , Meng Wang , Shan Yang , Yupeng Shi , Yuyong Kang , Dan Su , Shidong Shang , Dong Yu

Speaking Clearly: A Simplified Whisper-Based Codec for Low-Bitrate Speech Coding

Speech codecs serve as bridges between continuous speech signals and large language models, yet face an inherent conflict between acoustic fidelity and semantic preservation. To mitigate this conflict, prevailing methods augment acoustic…

Sound · Computer Science 2026-01-28 Xin Zhang , Lin Li , Xiangni Lu , Jianquan Liu , Kong Aik Lee

SoundStream: An End-to-End Neural Audio Codec

We present SoundStream, a novel neural audio codec that can efficiently compress speech, music and general audio at bitrates normally targeted by speech-tailored codecs. SoundStream relies on a model architecture composed by a fully…

Sound · Computer Science 2021-07-08 Neil Zeghidour , Alejandro Luebs , Ahmed Omran , Jan Skoglund , Marco Tagliasacchi

A Neural Speech Codec for Noise Robust Speech Coding

This paper considers the joint compression and enhancement problem for speech signal in the presence of noise. Recently, the SoundStream codec, which relies on end-to-end joint training of an encoder-decoder pair and a residual vector…

Sound · Computer Science 2025-09-03 Jiayi Huang , Zeyu Yan , Wenbin Jiang , He Wang , Fei Wen

Optimizing Neural Speech Codec for Low-Bitrate Compression via Multi-Scale Encoding

Neural speech codecs have demonstrated their ability to compress high-quality speech and audio by converting them into discrete token representations. Most existing methods utilize Residual Vector Quantization (RVQ) to encode speech into…

Sound · Computer Science 2024-10-22 Peiji Yang , Fengping Wang , Yicheng Zhong , Huawei Wei , Zhisheng Wang

Robust Soft-Constrained Spatially Selective Active Noise Control for Hearables Under Secondary Path Variations

Spatially selective active noise control (SSANC) hearables aim to attenuate noise from certain directions at the eardrum while preserving desired speech arriving from selected directions. Existing SSANC systems typically assume an accurate…

Audio and Speech Processing · Electrical Eng. & Systems 2026-05-19 Tong Xiao , Reinhild Roden , Matthias Blau , Simon Doclo

Soft-Constrained Spatially Selective Active Noise Control for Open-fitting Hearables

Recent advances in spatially selective active noise control (SSANC) using multiple microphones have enabled hearables to suppress undesired noise while preserving desired speech from a specific direction. Aiming to achieve minimal speech…

Audio and Speech Processing · Electrical Eng. & Systems 2025-07-18 Tong Xiao , Reinhild Roden , Matthias Blau , Simon Doclo

ADNAC: Audio Denoiser using Neural Audio Codec

Audio denoising is critical in signal processing, enhancing intelligibility and fidelity for applications like restoring musical recordings. This paper presents a proof-of-concept for adapting a state-of-the-art neural audio codec, the…

Sound · Computer Science 2025-11-04 Daniel Jimon , Mircea Vaida , Adriana Stan

Exploring Disentangled Neural Speech Codecs from Self-Supervised Representations

Neural audio codecs (NACs), which use neural networks to generate compact audio representations, have garnered interest for their applicability to many downstream tasks -- especially quantized codecs due to their compatibility with large…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-13 Ryo Aihara , Yoshiki Masuyama , Gordon Wichern , François G. Germain , Jonathan Le Roux

Psychoacoustic Calibration of Loss Functions for Efficient End-to-End Neural Audio Coding

Conventional audio coding technologies commonly leverage human perception of sound, or psychoacoustics, to reduce the bitrate while preserving the perceptual quality of the decoded audio signals. For neural audio codecs, however, the…

Sound · Computer Science 2021-01-05 Kai Zhen , Mi Suk Lee , Jongmo Sung , Seungkwon Beack , Minje Kim

SNAC: Multi-Scale Neural Audio Codec

Neural audio codecs have recently gained popularity because they can represent audio signals with high fidelity at very low bitrates, making it feasible to use language modeling approaches for audio generation and understanding. Residual…

Sound · Computer Science 2024-10-21 Hubert Siuzdak , Florian Grötschla , Luca A. Lanzendörfer

Towards Audio Codec-based Speech Separation

Recent improvements in neural audio codec (NAC) models have generated interest in adopting pre-trained codecs for a variety of speech processing applications to take advantage of the efficiencies gained from high compression, but these have…

Sound · Computer Science 2024-07-08 Jia Qi Yip , Shengkui Zhao , Dianwen Ng , Eng Siong Chng , Bin Ma

Semantic Codebooks as Effective Priors for Neural Speech Compression

Speech codecs are traditionally optimized for waveform fidelity, allocating bits to preserve acoustic detail even when much of it can be inferred from linguistic structure. This leads to inefficient compression and suboptimal performance on…

Sound · Computer Science 2025-12-29 Liuyang Bai , Weiyi Lu , Li Guo