Related papers: NDVQ: Robust Neural Audio Codec with Normal Distri…

Towards Bitrate-Efficient and Noise-Robust Speech Coding with Variable Bitrate RVQ

Residual Vector Quantization (RVQ) has become a dominant approach in neural speech and audio coding, providing high-fidelity compression. However, speech coding presents additional challenges due to real-world noise, which degrades…

Sound · Computer Science 2025-06-23 Yunkee Chae , Kyogu Lee

Improving Test-Time Performance of RVQ-based Neural Codecs

The residual vector quantization (RVQ) technique plays a central role in recent advances in neural audio codecs. These models effectively synthesize high-fidelity audio from a limited number of codes due to the hierarchical structure among…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-24 Hyeongju Kim , Junhyeok Lee , Jacob Morton , Juheon Lee , Jinhyeok Yang

ERVQ: Enhanced Residual Vector Quantization with Intra-and-Inter-Codebook Optimization for Neural Audio Codecs

Current neural audio codecs typically use residual vector quantization (RVQ) to discretize speech signals. However, they often experience codebook collapse, which reduces the effective codebook size and leads to suboptimal performance. To…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-12 Rui-Chen Zheng , Hui-Peng Du , Xiao-Hang Jiang , Yang Ai , Zhen-Hua Ling

Variable Bitrate Residual Vector Quantization for Audio Coding

Recent state-of-the-art neural audio compression models have progressively adopted residual vector quantization (RVQ). Despite this success, these models employ a fixed number of codebooks per frame, which can be suboptimal in terms of…

Sound · Computer Science 2025-04-29 Yunkee Chae , Woosung Choi , Yuhta Takida , Junghyun Koo , Yukara Ikemiya , Zhi Zhong , Kin Wai Cheuk , Marco A. Martínez-Ramírez , Kyogu Lee , Wei-Hsiang Liao , Yuki Mitsufuji

Differentiable Vector Quantization for Rate-Distortion Optimization of Generative Image Compression

The rapid growth of visual data under stringent storage and bandwidth constraints makes extremely low-bitrate image compression increasingly important. While Vector Quantization (VQ) offers strong structural fidelity, existing methods lack…

Computer Vision and Pattern Recognition · Computer Science 2026-05-05 Shiyin Jiang , Wei Long , Minghao Han , Zhenghao Chen , Ce Zhu , Shuhang Gu

SNAC: Multi-Scale Neural Audio Codec

Neural audio codecs have recently gained popularity because they can represent audio signals with high fidelity at very low bitrates, making it feasible to use language modeling approaches for audio generation and understanding. Residual…

Sound · Computer Science 2024-10-21 Hubert Siuzdak , Florian Grötschla , Luca A. Lanzendörfer

NVTC: Nonlinear Vector Transform Coding

In theory, vector quantization (VQ) is always better than scalar quantization (SQ) in terms of rate-distortion (R-D) performance. Recent state-of-the-art methods for neural image compression are mainly based on nonlinear transform coding…

Computer Vision and Pattern Recognition · Computer Science 2023-05-26 Runsen Feng , Zongyu Guo , Weiping Li , Zhibo Chen

VAEVQ: Enhancing Discrete Visual Tokenization through Variational Modeling

Vector quantization (VQ) transforms continuous image features into discrete representations, providing compressed, tokenized inputs for generative models. However, VQ-based frameworks suffer from several issues, such as non-smooth latent…

Computer Vision and Pattern Recognition · Computer Science 2025-11-11 Sicheng Yang , Xing Hu , Qiang Wu , Dawei Yang

SwitchCodec: A High-Fidelity Nerual Audio Codec With Sparse Quantization

Neural audio compression has emerged as a promising technology for efficiently representing speech, music, and general audio. However, existing methods suffer from significant performance degradation at limited bitrates, where the available…

Sound · Computer Science 2026-05-08 Jin Wang , Wenbin Jiang , Xiangbo Wang , Yubo You , Sheng Fang

Beyond Stationarity: Rethinking Codebook Collapse in Vector Quantization

Vector Quantization (VQ) underpins many modern generative frameworks such as VQ-VAE, VQ-GAN, and latent diffusion models. Yet, it suffers from the persistent problem of codebook collapse, where a large fraction of code vectors remains…

Computer Vision and Pattern Recognition · Computer Science 2026-02-24 Hao Lu , Onur C. Koyun , Yongxin Guo , Zhengjie Zhu , Abbas Alili , Metin Nafi Gurcan

Soft Convex Quantization: Revisiting Vector Quantization with Convex Optimization

Vector Quantization (VQ) is a well-known technique in deep learning for extracting informative discrete latent representations. VQ-embedded models have shown impressive results in a range of applications including image and speech…

Machine Learning · Computer Science 2023-10-05 Tanmay Gautam , Reid Pryzant , Ziyi Yang , Chenguang Zhu , Somayeh Sojoudi

Neural Speech Coding for Real-time Communications using Constant Bitrate Scalar Quantization

Neural audio coding has emerged as a vivid research direction by promising good audio quality at very low bitrates unachievable by classical coding techniques. Here, end-to-end trainable autoencoder-like models represent the state of the…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-20 Andreas Brendel , Nicola Pia , Kishan Gupta , Lyonel Behringer , Guillaume Fuchs , Markus Multrus

Tensor Network Assisted Distributed Variational Quantum Algorithm for Large Scale Combinatorial Optimization Problem

Although quantum computing holds promise for solving Combinatorial Optimization Problems (COPs), the limited qubit capacity of NISQ hardware makes large-scale instances intractable. Conventional methods attempt to bridge this gap through…

Quantum Physics · Physics 2026-01-21 Yuhan Huang , Siyuan Jin , Yichi Zhang , Qi Zhao , Jun Qi , Qiming Shao

Discrete Acoustic Space for an Efficient Sampling in Neural Text-To-Speech

We present a Split Vector Quantized Variational Autoencoder (SVQ-VAE) architecture using a split vector quantizer for NTTS, as an enhancement to the well-known Variational Autoencoder (VAE) and Vector Quantized Variational Autoencoder…

Sound · Computer Science 2023-09-15 Marek Strong , Jonas Rohnke , Antonio Bonafonte , Mateusz Łajszczak , Trevor Wood

Finite Scalar Quantization Enables Redundant and Transmission-Robust Neural Audio Compression at Low Bit-rates

Neural Audio Codecs (NACs) have become increasingly adopted in speech processing tasks due to their excellent rate-distortion performance and compatibility with Large Language Models (LLMs) as discrete feature representations for audio…

Sound · Computer Science 2025-09-15 Harry Julian , Rachel Beeson , Lohith Konathala , Johanna Ulin , Jiameng Gao

Entropy-Guided GRVQ for Ultra-Low Bitrate Neural Speech Codec

Neural audio codec (NAC) is essential for reconstructing high-quality speech signals and generating discrete representations for downstream speech language models. However, ensuring accurate semantic modeling while maintaining high-fidelity…

Audio and Speech Processing · Electrical Eng. & Systems 2026-03-03 Yanzhou Ren , Noboru Harada , Daiki Takeuchi , Siyu Chen , Wei Liu , Xiao Zhang , Liyuan Zhang , Takehiro Moriya , Shoji Makino

Channel-Aware Vector Quantization for Robust Semantic Communication on Discrete Channels

Deep learning-based semantic communication has largely relied on analog or semi-digital transmission, which limits compatibility with modern digital communication infrastructures. Recent studies have employed vector quantization (VQ) to…

Signal Processing · Electrical Eng. & Systems 2025-10-22 Zian Meng , Qiang Li , Wenqian Tang , Mingdie Yan , Xiaohu Ge

Adaptive Discrete Communication Bottlenecks with Dynamic Vector Quantization

Vector Quantization (VQ) is a method for discretizing latent representations and has become a major part of the deep learning toolkit. It has been theoretically and empirically shown that discretization of representations leads to improved…

Machine Learning · Computer Science 2022-02-04 Dianbo Liu , Alex Lamb , Xu Ji , Pascal Notsawo , Mike Mozer , Yoshua Bengio , Kenji Kawaguchi

Individualized non-uniform quantization for vector search

Embedding vectors are widely used for representing unstructured data and searching through it for semantically similar items. However, the large size of these vectors, due to their high-dimensionality, creates problems for modern vector…

Machine Learning · Computer Science 2025-09-24 Mariano Tepper , Ted Willke

Vector-quantized neural networks for acoustic unit discovery in the ZeroSpeech 2020 challenge

In this paper, we explore vector quantization for acoustic unit discovery. Leveraging unlabelled data, we aim to learn discrete representations of speech that separate phonetic content from speaker-specific details. We propose two neural…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-20 Benjamin van Niekerk , Leanne Nortje , Herman Kamper