Related papers: ESC: Efficient Speech Coding with Cross-Scale Resi…

Optimizing Neural Speech Codec for Low-Bitrate Compression via Multi-Scale Encoding

Neural speech codecs have demonstrated their ability to compress high-quality speech and audio by converting them into discrete token representations. Most existing methods utilize Residual Vector Quantization (RVQ) to encode speech into…

Sound · Computer Science 2024-10-22 Peiji Yang , Fengping Wang , Yicheng Zhong , Huawei Wei , Zhisheng Wang

Towards General Discrete Speech Codec for Complex Acoustic Environments: A Study of Reconstruction and Downstream Task Consistency

Neural speech codecs excel in reconstructing clean speech signals; however, their efficacy in complex acoustic environments and downstream signal processing tasks remains underexplored. In this study, we introduce a novel benchmark named…

Sound · Computer Science 2025-05-29 Haoran Wang , Guanyu Chen , Bohan Li , Hankun Wang , Yiwei Guo , Zhihan Li , Xie Chen , Kai Yu

Ultra-Low-Bitrate Speech Coding with Pretrained Transformers

Speech coding facilitates the transmission of speech over low-bandwidth networks with minimal distortion. Neural-network based speech codecs have recently demonstrated significant improvements in quality over traditional approaches. While…

Sound · Computer Science 2022-07-07 Ali Siahkoohi , Michael Chinen , Tom Denton , W. Bastiaan Kleijn , Jan Skoglund

NESC: Robust Neural End-2-End Speech Coding with GANs

Neural networks have proven to be a formidable tool to tackle the problem of speech coding at very low bit rates. However, the design of a neural coder that can be operated robustly under real-world conditions remains a major challenge.…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-08 Nicola Pia , Kishan Gupta , Srikanth Korse , Markus Multrus , Guillaume Fuchs

SNAC: Multi-Scale Neural Audio Codec

Neural audio codecs have recently gained popularity because they can represent audio signals with high fidelity at very low bitrates, making it feasible to use language modeling approaches for audio generation and understanding. Residual…

Sound · Computer Science 2024-10-21 Hubert Siuzdak , Florian Grötschla , Luca A. Lanzendörfer

Switchcodec: Adaptive residual-expert sparse quantization for high-fidelity neural audio coding

Recent neural audio compression models often rely on residual vector quantization for high-fidelity coding, but using a fixed number of per-frame codebooks is suboptimal for the wide variability of audio content-especially for signals that…

Sound · Computer Science 2026-05-08 Xiangbo Wang , Wenbin Jiang , Jin Wang , Yubo You , Sheng Fang , Fei Wen

ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech

Neural codecs have become crucial to recent speech and audio generation research. In addition to signal compression capabilities, discrete codecs have also been found to enhance downstream training efficiency and compatibility with…

Audio and Speech Processing · Electrical Eng. & Systems 2025-02-25 Jiatong Shi , Jinchuan Tian , Yihan Wu , Jee-weon Jung , Jia Qi Yip , Yoshiki Masuyama , William Chen , Yuning Wu , Yuxun Tang , Massa Baali , Dareen Alharhi , Dong Zhang , Ruifan Deng , Tejes Srivastava , Haibin Wu , Alexander H. Liu , Bhiksha Raj , Qin Jin , Ruihua Song , Shinji Watanabe

Scalable and Efficient Neural Speech Coding: A Hybrid Design

We present a scalable and efficient neural waveform coding system for speech compression. We formulate the speech coding problem as an autoencoding task, where a convolutional neural network (CNN) performs encoding and decoding as a neural…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-30 Kai Zhen , Jongmo Sung , Mi Suk Lee , Seungkwon Beak , Minje Kim

Speech Enhancement Using Continuous Embeddings of Neural Audio Codec

Recent advancements in Neural Audio Codec (NAC) models have inspired their use in various speech processing tasks, including speech enhancement (SE). In this work, we propose a novel, efficient SE approach by leveraging the pre-quantization…

Audio and Speech Processing · Electrical Eng. & Systems 2025-03-18 Haoyang Li , Jia Qi Yip , Tianyu Fan , Eng Siong Chng

FreeCodec: A disentangled neural speech codec with fewer tokens

Neural speech codecs have gained great attention for their outstanding reconstruction with discrete token representations. It is a crucial component in generative tasks such as speech coding and large language models (LLM). However, most…

Sound · Computer Science 2025-07-01 Youqiang Zheng , Weiping Tu , Yueteng Kang , Jie Chen , Yike Zhang , Li Xiao , Yuhong Yang , Long Ma

On Improving Error Resilience of Neural End-to-End Speech Coders

Error resilient tools like Packet Loss Concealment (PLC) and Forward Error Correction (FEC) are essential to maintain a reliable speech communication for applications like Voice over Internet Protocol (VoIP), where packets are frequently…

Audio and Speech Processing · Electrical Eng. & Systems 2025-05-23 Kishan Gupta , Nicola Pia , Srikanth Korse , Andreas Brendel , Guillaume Fuchs , Markus Multrus

Semantic Codebooks as Effective Priors for Neural Speech Compression

Speech codecs are traditionally optimized for waveform fidelity, allocating bits to preserve acoustic detail even when much of it can be inferred from linguistic structure. This leads to inefficient compression and suboptimal performance on…

Sound · Computer Science 2025-12-29 Liuyang Bai , Weiyi Lu , Li Guo

Efficient And Scalable Neural Residual Waveform Coding With Collaborative Quantization

Scalability and efficiency are desired in neural speech codecs, which supports a wide range of bitrates for applications on various devices. We propose a collaborative quantization (CQ) scheme to jointly learn the codebook of LPC…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-14 Kai Zhen , Mi Suk Lee , Jongmo Sung , Seungkwon Beack , Minje Kim

Attention based Convolutional Recurrent Neural Network for Environmental Sound Classification

Environmental sound classification (ESC) is a challenging problem due to the complexity of sounds. The ESC performance is heavily dependent on the effectiveness of representative features extracted from the environmental sounds. However,…

Sound · Computer Science 2019-07-05 Zhichao Zhang , Shugong Xu , Tianhao Qiao , Shunqing Zhang , Shan Cao

Codec2Vec: Self-Supervised Speech Representation Learning Using Neural Speech Codecs

Recent advancements in neural audio codecs have not only enabled superior audio compression but also enhanced speech synthesis techniques. Researchers are now exploring their potential as universal acoustic feature extractors for a broader…

Audio and Speech Processing · Electrical Eng. & Systems 2025-11-21 Wei-Cheng Tseng , David Harwath

VoCodec: An Efficient Lightweight Low-Bitrate Speech Codec

Recent advancements in end-to-end neural speech codecs enable compressing audio at extremely low bitrates while maintaining high-fidelity reconstruction. Meanwhile, low computational complexity and low latency are crucial for real-time…

Audio and Speech Processing · Electrical Eng. & Systems 2026-01-21 Leyan Yang , Ronghui Hu , Yang Xu , Jing Lu

SwitchCodec: A High-Fidelity Nerual Audio Codec With Sparse Quantization

Neural audio compression has emerged as a promising technology for efficiently representing speech, music, and general audio. However, existing methods suffer from significant performance degradation at limited bitrates, where the available…

Sound · Computer Science 2026-05-08 Jin Wang , Wenbin Jiang , Xiangbo Wang , Yubo You , Sheng Fang

Fewer-token Neural Speech Codec with Time-invariant Codes

Language model based text-to-speech (TTS) models, like VALL-E, have gained attention for their outstanding in-context learning capability in zero-shot scenarios. Neural speech codec is a critical component of these models, which can convert…

Sound · Computer Science 2024-03-12 Yong Ren , Tao Wang , Jiangyan Yi , Le Xu , Jianhua Tao , Chuyuan Zhang , Junzuo Zhou

ERVQ: Enhanced Residual Vector Quantization with Intra-and-Inter-Codebook Optimization for Neural Audio Codecs

Current neural audio codecs typically use residual vector quantization (RVQ) to discretize speech signals. However, they often experience codebook collapse, which reduces the effective codebook size and leads to suboptimal performance. To…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-12 Rui-Chen Zheng , Hui-Peng Du , Xiao-Hang Jiang , Yang Ai , Zhen-Hua Ling

Restorative Speech Enhancement: A Progressive Approach Using SE and Codec Modules

In challenging environments with significant noise and reverberation, traditional speech enhancement (SE) methods often lead to over-suppressed speech, creating artifacts during listening and harming downstream tasks performance. To…

Audio and Speech Processing · Electrical Eng. & Systems 2024-10-03 Hsin-Tien Chiang , Hao Zhang , Yong Xu , Meng Yu , Dong Yu