Related papers: Cross-Scale Vector Quantization for Scalable Neura…

SNAC: Multi-Scale Neural Audio Codec

Neural audio codecs have recently gained popularity because they can represent audio signals with high fidelity at very low bitrates, making it feasible to use language modeling approaches for audio generation and understanding. Residual…

Sound · Computer Science 2024-10-21 Hubert Siuzdak , Florian Grötschla , Luca A. Lanzendörfer

Variable Bitrate Residual Vector Quantization for Audio Coding

Recent state-of-the-art neural audio compression models have progressively adopted residual vector quantization (RVQ). Despite this success, these models employ a fixed number of codebooks per frame, which can be suboptimal in terms of…

Sound · Computer Science 2025-04-29 Yunkee Chae , Woosung Choi , Yuhta Takida , Junghyun Koo , Yukara Ikemiya , Zhi Zhong , Kin Wai Cheuk , Marco A. Martínez-Ramírez , Kyogu Lee , Wei-Hsiang Liao , Yuki Mitsufuji

Towards Bitrate-Efficient and Noise-Robust Speech Coding with Variable Bitrate RVQ

Residual Vector Quantization (RVQ) has become a dominant approach in neural speech and audio coding, providing high-fidelity compression. However, speech coding presents additional challenges due to real-world noise, which degrades…

Sound · Computer Science 2025-06-23 Yunkee Chae , Kyogu Lee

Efficient And Scalable Neural Residual Waveform Coding With Collaborative Quantization

Scalability and efficiency are desired in neural speech codecs, which supports a wide range of bitrates for applications on various devices. We propose a collaborative quantization (CQ) scheme to jointly learn the codebook of LPC…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-14 Kai Zhen , Mi Suk Lee , Jongmo Sung , Seungkwon Beack , Minje Kim

Neural Speech Coding for Real-time Communications using Constant Bitrate Scalar Quantization

Neural audio coding has emerged as a vivid research direction by promising good audio quality at very low bitrates unachievable by classical coding techniques. Here, end-to-end trainable autoencoder-like models represent the state of the…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-20 Andreas Brendel , Nicola Pia , Kishan Gupta , Lyonel Behringer , Guillaume Fuchs , Markus Multrus

SwitchCodec: A High-Fidelity Nerual Audio Codec With Sparse Quantization

Neural audio compression has emerged as a promising technology for efficiently representing speech, music, and general audio. However, existing methods suffer from significant performance degradation at limited bitrates, where the available…

Sound · Computer Science 2026-05-08 Jin Wang , Wenbin Jiang , Xiangbo Wang , Yubo You , Sheng Fang

Optimizing Neural Speech Codec for Low-Bitrate Compression via Multi-Scale Encoding

Neural speech codecs have demonstrated their ability to compress high-quality speech and audio by converting them into discrete token representations. Most existing methods utilize Residual Vector Quantization (RVQ) to encode speech into…

Sound · Computer Science 2024-10-22 Peiji Yang , Fengping Wang , Yicheng Zhong , Huawei Wei , Zhisheng Wang

Rate-Adaptive Semantic Communication via Multi-Stage Vector Quantization

This paper proposes a novel framework for rate-adaptive semantic communication based on multi-stage vector quantization (VQ), termed \textit{MSVQ-SC}. Unlike conventional single-stage VQ approaches, which require exponentially larger…

Signal Processing · Electrical Eng. & Systems 2025-10-06 Jinsung Park , Junyong Shin , Yongjeong Oh , Jihun Park , Yo-Seb Jeon

Switchcodec: Adaptive residual-expert sparse quantization for high-fidelity neural audio coding

Recent neural audio compression models often rely on residual vector quantization for high-fidelity coding, but using a fixed number of per-frame codebooks is suboptimal for the wide variability of audio content-especially for signals that…

Sound · Computer Science 2026-05-08 Xiangbo Wang , Wenbin Jiang , Jin Wang , Yubo You , Sheng Fang , Fei Wen

CQNV: A combination of coarsely quantized bitstream and neural vocoder for low rate speech coding

Recently, speech codecs based on neural networks have proven to perform better than traditional methods. However, redundancy in traditional parameter quantization is visible within the codec architecture of combining the traditional codec…

Sound · Computer Science 2023-07-26 Youqiang Zheng , Li Xiao , Weiping Tu , Yuhong Yang , Xinmeng Xu

Scalable and Efficient Neural Speech Coding: A Hybrid Design

We present a scalable and efficient neural waveform coding system for speech compression. We formulate the speech coding problem as an autoencoding task, where a convolutional neural network (CNN) performs encoding and decoding as a neural…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-30 Kai Zhen , Jongmo Sung , Mi Suk Lee , Seungkwon Beak , Minje Kim

A Streamable Neural Audio Codec with Residual Scalar-Vector Quantization for Real-Time Communication

This paper proposes StreamCodec, a streamable neural audio codec designed for real-time communication. StreamCodec adopts a fully causal, symmetric encoder-decoder structure and operates in the modified discrete cosine transform (MDCT)…

Sound · Computer Science 2025-04-10 Xiao-Hang Jiang , Yang Ai , Rui-Chen Zheng , Zhen-Hua Ling

SSVQ: Unleashing the Potential of Vector Quantization with Sign-Splitting

Vector Quantization (VQ) has emerged as a prominent weight compression technique, showcasing substantially lower quantization errors than uniform quantization across diverse models, particularly in extreme compression scenarios. However,…

Computer Vision and Pattern Recognition · Computer Science 2025-08-05 Shuaiting Li , Juncan Deng , Chenxuan Wang , Kedong Xu , Rongtao Deng , Hong Gu , Haibin Shen , Kejie Huang

Improving Test-Time Performance of RVQ-based Neural Codecs

The residual vector quantization (RVQ) technique plays a central role in recent advances in neural audio codecs. These models effectively synthesize high-fidelity audio from a limited number of codes due to the hierarchical structure among…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-24 Hyeongju Kim , Junhyeok Lee , Jacob Morton , Juheon Lee , Jinhyeok Yang

ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers

Neural speech codecs aim to compress input signals into minimal bits while maintaining content quality in a low-latency manner. However, existing neural codecs often trade model complexity for reconstruction performance. These codecs…

Sound · Computer Science 2024-10-04 Yuzhe Gu , Enmao Diao

ERVQ: Enhanced Residual Vector Quantization with Intra-and-Inter-Codebook Optimization for Neural Audio Codecs

Current neural audio codecs typically use residual vector quantization (RVQ) to discretize speech signals. However, they often experience codebook collapse, which reduces the effective codebook size and leads to suboptimal performance. To…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-12 Rui-Chen Zheng , Hui-Peng Du , Xiao-Hang Jiang , Yang Ai , Zhen-Hua Ling

Soft Convex Quantization: Revisiting Vector Quantization with Convex Optimization

Vector Quantization (VQ) is a well-known technique in deep learning for extracting informative discrete latent representations. VQ-embedded models have shown impressive results in a range of applications including image and speech…

Machine Learning · Computer Science 2023-10-05 Tanmay Gautam , Reid Pryzant , Ziyi Yang , Chenguang Zhu , Somayeh Sojoudi

Scalable Image Tokenization with Index Backpropagation Quantization

Existing vector quantization (VQ) methods struggle with scalability, largely attributed to the instability of the codebook that undergoes partial updates during training. The codebook is prone to collapse as utilization decreases, due to…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Fengyuan Shi , Zhuoyan Luo , Yixiao Ge , Yujiu Yang , Ying Shan , Limin Wang

MOC-RVQ: Multilevel Codebook-Assisted Digital Generative Semantic Communication

Vector quantization-based image semantic communication systems have successfully boosted transmission efficiency, but face challenges with conflicting requirements between codebook design and digital constellation modulation. Traditional…

Computer Vision and Pattern Recognition · Computer Science 2024-10-01 Yingbin Zhou , Yaping Sun , Guanying Chen , Xiaodong Xu , Hao Chen , Binhong Huang , Shuguang Cui , Ping Zhang

Scaling Transformers for Low-Bitrate High-Quality Speech Coding

The tokenization of speech with neural audio codec models is a vital part of modern AI pipelines for the generation or understanding of speech, alone or in a multimodal context. Traditionally such tokenization models have concentrated on…

Audio and Speech Processing · Electrical Eng. & Systems 2024-12-02 Julian D Parker , Anton Smirnov , Jordi Pons , CJ Carr , Zack Zukowski , Zach Evans , Xubo Liu