English
Related papers

Related papers: ESC: Efficient Speech Coding with Cross-Scale Resi…

200 papers

Neural speech codecs have demonstrated their ability to compress high-quality speech and audio by converting them into discrete token representations. Most existing methods utilize Residual Vector Quantization (RVQ) to encode speech into…

Sound · Computer Science 2024-10-22 Peiji Yang , Fengping Wang , Yicheng Zhong , Huawei Wei , Zhisheng Wang

Neural speech codecs excel in reconstructing clean speech signals; however, their efficacy in complex acoustic environments and downstream signal processing tasks remains underexplored. In this study, we introduce a novel benchmark named…

Sound · Computer Science 2025-05-29 Haoran Wang , Guanyu Chen , Bohan Li , Hankun Wang , Yiwei Guo , Zhihan Li , Xie Chen , Kai Yu

Speech coding facilitates the transmission of speech over low-bandwidth networks with minimal distortion. Neural-network based speech codecs have recently demonstrated significant improvements in quality over traditional approaches. While…

Sound · Computer Science 2022-07-07 Ali Siahkoohi , Michael Chinen , Tom Denton , W. Bastiaan Kleijn , Jan Skoglund

Neural networks have proven to be a formidable tool to tackle the problem of speech coding at very low bit rates. However, the design of a neural coder that can be operated robustly under real-world conditions remains a major challenge.…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-08 Nicola Pia , Kishan Gupta , Srikanth Korse , Markus Multrus , Guillaume Fuchs

Neural audio codecs have recently gained popularity because they can represent audio signals with high fidelity at very low bitrates, making it feasible to use language modeling approaches for audio generation and understanding. Residual…

Sound · Computer Science 2024-10-21 Hubert Siuzdak , Florian Grötschla , Luca A. Lanzendörfer

Recent neural audio compression models often rely on residual vector quantization for high-fidelity coding, but using a fixed number of per-frame codebooks is suboptimal for the wide variability of audio content-especially for signals that…

Sound · Computer Science 2026-05-08 Xiangbo Wang , Wenbin Jiang , Jin Wang , Yubo You , Sheng Fang , Fei Wen

Neural codecs have become crucial to recent speech and audio generation research. In addition to signal compression capabilities, discrete codecs have also been found to enhance downstream training efficiency and compatibility with…

We present a scalable and efficient neural waveform coding system for speech compression. We formulate the speech coding problem as an autoencoding task, where a convolutional neural network (CNN) performs encoding and decoding as a neural…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-30 Kai Zhen , Jongmo Sung , Mi Suk Lee , Seungkwon Beak , Minje Kim

Recent advancements in Neural Audio Codec (NAC) models have inspired their use in various speech processing tasks, including speech enhancement (SE). In this work, we propose a novel, efficient SE approach by leveraging the pre-quantization…

Audio and Speech Processing · Electrical Eng. & Systems 2025-03-18 Haoyang Li , Jia Qi Yip , Tianyu Fan , Eng Siong Chng

Neural speech codecs have gained great attention for their outstanding reconstruction with discrete token representations. It is a crucial component in generative tasks such as speech coding and large language models (LLM). However, most…

Sound · Computer Science 2025-07-01 Youqiang Zheng , Weiping Tu , Yueteng Kang , Jie Chen , Yike Zhang , Li Xiao , Yuhong Yang , Long Ma

Error resilient tools like Packet Loss Concealment (PLC) and Forward Error Correction (FEC) are essential to maintain a reliable speech communication for applications like Voice over Internet Protocol (VoIP), where packets are frequently…

Audio and Speech Processing · Electrical Eng. & Systems 2025-05-23 Kishan Gupta , Nicola Pia , Srikanth Korse , Andreas Brendel , Guillaume Fuchs , Markus Multrus

Speech codecs are traditionally optimized for waveform fidelity, allocating bits to preserve acoustic detail even when much of it can be inferred from linguistic structure. This leads to inefficient compression and suboptimal performance on…

Sound · Computer Science 2025-12-29 Liuyang Bai , Weiyi Lu , Li Guo

Scalability and efficiency are desired in neural speech codecs, which supports a wide range of bitrates for applications on various devices. We propose a collaborative quantization (CQ) scheme to jointly learn the codebook of LPC…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-14 Kai Zhen , Mi Suk Lee , Jongmo Sung , Seungkwon Beack , Minje Kim

Environmental sound classification (ESC) is a challenging problem due to the complexity of sounds. The ESC performance is heavily dependent on the effectiveness of representative features extracted from the environmental sounds. However,…

Sound · Computer Science 2019-07-05 Zhichao Zhang , Shugong Xu , Tianhao Qiao , Shunqing Zhang , Shan Cao

Recent advancements in neural audio codecs have not only enabled superior audio compression but also enhanced speech synthesis techniques. Researchers are now exploring their potential as universal acoustic feature extractors for a broader…

Audio and Speech Processing · Electrical Eng. & Systems 2025-11-21 Wei-Cheng Tseng , David Harwath

Recent advancements in end-to-end neural speech codecs enable compressing audio at extremely low bitrates while maintaining high-fidelity reconstruction. Meanwhile, low computational complexity and low latency are crucial for real-time…

Audio and Speech Processing · Electrical Eng. & Systems 2026-01-21 Leyan Yang , Ronghui Hu , Yang Xu , Jing Lu

Neural audio compression has emerged as a promising technology for efficiently representing speech, music, and general audio. However, existing methods suffer from significant performance degradation at limited bitrates, where the available…

Sound · Computer Science 2026-05-08 Jin Wang , Wenbin Jiang , Xiangbo Wang , Yubo You , Sheng Fang

Language model based text-to-speech (TTS) models, like VALL-E, have gained attention for their outstanding in-context learning capability in zero-shot scenarios. Neural speech codec is a critical component of these models, which can convert…

Sound · Computer Science 2024-03-12 Yong Ren , Tao Wang , Jiangyan Yi , Le Xu , Jianhua Tao , Chuyuan Zhang , Junzuo Zhou

Current neural audio codecs typically use residual vector quantization (RVQ) to discretize speech signals. However, they often experience codebook collapse, which reduces the effective codebook size and leads to suboptimal performance. To…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-12 Rui-Chen Zheng , Hui-Peng Du , Xiao-Hang Jiang , Yang Ai , Zhen-Hua Ling

In challenging environments with significant noise and reverberation, traditional speech enhancement (SE) methods often lead to over-suppressed speech, creating artifacts during listening and harming downstream tasks performance. To…

Audio and Speech Processing · Electrical Eng. & Systems 2024-10-03 Hsin-Tien Chiang , Hao Zhang , Yong Xu , Meng Yu , Dong Yu
‹ Prev 1 2 3 10 Next ›