English
Related papers

Related papers: Cross-Scale Vector Quantization for Scalable Neura…

200 papers

Neural audio codecs have recently gained popularity because they can represent audio signals with high fidelity at very low bitrates, making it feasible to use language modeling approaches for audio generation and understanding. Residual…

Sound · Computer Science 2024-10-21 Hubert Siuzdak , Florian Grötschla , Luca A. Lanzendörfer

Recent state-of-the-art neural audio compression models have progressively adopted residual vector quantization (RVQ). Despite this success, these models employ a fixed number of codebooks per frame, which can be suboptimal in terms of…

Residual Vector Quantization (RVQ) has become a dominant approach in neural speech and audio coding, providing high-fidelity compression. However, speech coding presents additional challenges due to real-world noise, which degrades…

Sound · Computer Science 2025-06-23 Yunkee Chae , Kyogu Lee

Scalability and efficiency are desired in neural speech codecs, which supports a wide range of bitrates for applications on various devices. We propose a collaborative quantization (CQ) scheme to jointly learn the codebook of LPC…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-14 Kai Zhen , Mi Suk Lee , Jongmo Sung , Seungkwon Beack , Minje Kim

Neural audio coding has emerged as a vivid research direction by promising good audio quality at very low bitrates unachievable by classical coding techniques. Here, end-to-end trainable autoencoder-like models represent the state of the…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-20 Andreas Brendel , Nicola Pia , Kishan Gupta , Lyonel Behringer , Guillaume Fuchs , Markus Multrus

Neural audio compression has emerged as a promising technology for efficiently representing speech, music, and general audio. However, existing methods suffer from significant performance degradation at limited bitrates, where the available…

Sound · Computer Science 2026-05-08 Jin Wang , Wenbin Jiang , Xiangbo Wang , Yubo You , Sheng Fang

Neural speech codecs have demonstrated their ability to compress high-quality speech and audio by converting them into discrete token representations. Most existing methods utilize Residual Vector Quantization (RVQ) to encode speech into…

Sound · Computer Science 2024-10-22 Peiji Yang , Fengping Wang , Yicheng Zhong , Huawei Wei , Zhisheng Wang

This paper proposes a novel framework for rate-adaptive semantic communication based on multi-stage vector quantization (VQ), termed \textit{MSVQ-SC}. Unlike conventional single-stage VQ approaches, which require exponentially larger…

Signal Processing · Electrical Eng. & Systems 2025-10-06 Jinsung Park , Junyong Shin , Yongjeong Oh , Jihun Park , Yo-Seb Jeon

Recent neural audio compression models often rely on residual vector quantization for high-fidelity coding, but using a fixed number of per-frame codebooks is suboptimal for the wide variability of audio content-especially for signals that…

Sound · Computer Science 2026-05-08 Xiangbo Wang , Wenbin Jiang , Jin Wang , Yubo You , Sheng Fang , Fei Wen

Recently, speech codecs based on neural networks have proven to perform better than traditional methods. However, redundancy in traditional parameter quantization is visible within the codec architecture of combining the traditional codec…

Sound · Computer Science 2023-07-26 Youqiang Zheng , Li Xiao , Weiping Tu , Yuhong Yang , Xinmeng Xu

We present a scalable and efficient neural waveform coding system for speech compression. We formulate the speech coding problem as an autoencoding task, where a convolutional neural network (CNN) performs encoding and decoding as a neural…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-30 Kai Zhen , Jongmo Sung , Mi Suk Lee , Seungkwon Beak , Minje Kim

This paper proposes StreamCodec, a streamable neural audio codec designed for real-time communication. StreamCodec adopts a fully causal, symmetric encoder-decoder structure and operates in the modified discrete cosine transform (MDCT)…

Sound · Computer Science 2025-04-10 Xiao-Hang Jiang , Yang Ai , Rui-Chen Zheng , Zhen-Hua Ling

Vector Quantization (VQ) has emerged as a prominent weight compression technique, showcasing substantially lower quantization errors than uniform quantization across diverse models, particularly in extreme compression scenarios. However,…

Computer Vision and Pattern Recognition · Computer Science 2025-08-05 Shuaiting Li , Juncan Deng , Chenxuan Wang , Kedong Xu , Rongtao Deng , Hong Gu , Haibin Shen , Kejie Huang

The residual vector quantization (RVQ) technique plays a central role in recent advances in neural audio codecs. These models effectively synthesize high-fidelity audio from a limited number of codes due to the hierarchical structure among…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-24 Hyeongju Kim , Junhyeok Lee , Jacob Morton , Juheon Lee , Jinhyeok Yang

Neural speech codecs aim to compress input signals into minimal bits while maintaining content quality in a low-latency manner. However, existing neural codecs often trade model complexity for reconstruction performance. These codecs…

Sound · Computer Science 2024-10-04 Yuzhe Gu , Enmao Diao

Current neural audio codecs typically use residual vector quantization (RVQ) to discretize speech signals. However, they often experience codebook collapse, which reduces the effective codebook size and leads to suboptimal performance. To…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-12 Rui-Chen Zheng , Hui-Peng Du , Xiao-Hang Jiang , Yang Ai , Zhen-Hua Ling

Vector Quantization (VQ) is a well-known technique in deep learning for extracting informative discrete latent representations. VQ-embedded models have shown impressive results in a range of applications including image and speech…

Machine Learning · Computer Science 2023-10-05 Tanmay Gautam , Reid Pryzant , Ziyi Yang , Chenguang Zhu , Somayeh Sojoudi

Existing vector quantization (VQ) methods struggle with scalability, largely attributed to the instability of the codebook that undergoes partial updates during training. The codebook is prone to collapse as utilization decreases, due to…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Fengyuan Shi , Zhuoyan Luo , Yixiao Ge , Yujiu Yang , Ying Shan , Limin Wang

Vector quantization-based image semantic communication systems have successfully boosted transmission efficiency, but face challenges with conflicting requirements between codebook design and digital constellation modulation. Traditional…

Computer Vision and Pattern Recognition · Computer Science 2024-10-01 Yingbin Zhou , Yaping Sun , Guanying Chen , Xiaodong Xu , Hao Chen , Binhong Huang , Shuguang Cui , Ping Zhang

The tokenization of speech with neural audio codec models is a vital part of modern AI pipelines for the generation or understanding of speech, alone or in a multimodal context. Traditionally such tokenization models have concentrated on…

Audio and Speech Processing · Electrical Eng. & Systems 2024-12-02 Julian D Parker , Anton Smirnov , Jordi Pons , CJ Carr , Zack Zukowski , Zach Evans , Xubo Liu
‹ Prev 1 2 3 10 Next ›