Related papers: Optimizing Neural Speech Codec for Low-Bitrate Com…

SNAC: Multi-Scale Neural Audio Codec

Neural audio codecs have recently gained popularity because they can represent audio signals with high fidelity at very low bitrates, making it feasible to use language modeling approaches for audio generation and understanding. Residual…

Sound · Computer Science 2024-10-21 Hubert Siuzdak , Florian Grötschla , Luca A. Lanzendörfer

LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec

Although discrete speech tokens have exhibited strong potential for language model-based speech generation, their high bitrates and redundant timbre information restrict the development of such models. In this work, we propose LSCodec, a…

Audio and Speech Processing · Electrical Eng. & Systems 2025-05-22 Yiwei Guo , Zhihan Li , Chenpeng Du , Hankun Wang , Xie Chen , Kai Yu

FreeCodec: A disentangled neural speech codec with fewer tokens

Neural speech codecs have gained great attention for their outstanding reconstruction with discrete token representations. It is a crucial component in generative tasks such as speech coding and large language models (LLM). However, most…

Sound · Computer Science 2025-07-01 Youqiang Zheng , Weiping Tu , Yueteng Kang , Jie Chen , Yike Zhang , Li Xiao , Yuhong Yang , Long Ma

MSR-Codec: A Low-Bitrate Multi-Stream Residual Codec for High-Fidelity Speech Generation with Information Disentanglement

Audio codecs are a critical component of modern speech generation systems. This paper introduces a low-bitrate, multi-scale residual codec that encodes speech into four distinct streams: semantic, timbre, prosody, and residual. This…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-25 Jingyu Li , Guangyan Zhang , Zhen Ye , Yiwen Guo

RepCodec: A Speech Representation Codec for Speech Tokenization

With recent rapid growth of large language models (LLMs), discrete speech tokenization has played an important role for injecting speech into LLMs. However, this discretization gives rise to a loss of information, consequently impairing…

Audio and Speech Processing · Electrical Eng. & Systems 2024-07-23 Zhichao Huang , Chutong Meng , Tom Ko

MBCodec:Thorough disentangle for high-fidelity audio compression

High-fidelity neural audio codecs in Text-to-speech (TTS) aim to compress speech signals into discrete representations for faithful reconstruction. However, prior approaches faced challenges in effectively disentangling acoustic and…

Sound · Computer Science 2025-09-23 Ruonan Zhang , Xiaoyang Hao , Yichen Han , Junjie Cao , Yue Liu , Kai Zhang

BSCodec: A Band-Split Neural Codec for High-Quality Universal Audio Reconstruction

Neural audio codecs have recently enabled high-fidelity reconstruction at high compression rates, especially for speech. However, speech and non-speech audio exhibit fundamentally different spectral characteristics: speech energy…

Audio and Speech Processing · Electrical Eng. & Systems 2025-11-11 Haoran Wang , Jiatong Shi , Jinchuan Tian , Bohan Li , Kai Yu , Shinji Watanabe

A Novel Semantic Compression Approach for Ultra-low Bandwidth Voice Communication

While existing speech audio codecs designed for compression exploit limited forms of temporal redundancy and allow for multi-scale representations, they tend to represent all features of audio in the same way. In contrast, generative voice…

Sound · Computer Science 2025-09-22 Ryan Collette , Ross Greenwood , Serena Nicoll

ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers

Neural speech codecs aim to compress input signals into minimal bits while maintaining content quality in a low-latency manner. However, existing neural codecs often trade model complexity for reconstruction performance. These codecs…

Sound · Computer Science 2024-10-04 Yuzhe Gu , Enmao Diao

BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec

We present BigCodec, a low-bitrate neural speech codec. While recent neural speech codecs have shown impressive progress, their performance significantly deteriorates at low bitrates (around 1 kbps). Although a low bitrate inherently…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-10 Detai Xin , Xu Tan , Shinnosuke Takamichi , Hiroshi Saruwatari

VoCodec: An Efficient Lightweight Low-Bitrate Speech Codec

Recent advancements in end-to-end neural speech codecs enable compressing audio at extremely low bitrates while maintaining high-fidelity reconstruction. Meanwhile, low computational complexity and low latency are crucial for real-time…

Audio and Speech Processing · Electrical Eng. & Systems 2026-01-21 Leyan Yang , Ronghui Hu , Yang Xu , Jing Lu

Language-Codec: Bridging Discrete Codec Representations and Speech Language Models

In recent years, large language models have achieved significant success in generative tasks related to speech, audio, music, and other signal domains. A crucial element of these models is the discrete acoustic codecs, which serve as an…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-05 Shengpeng Ji , Minghui Fang , Jialong Zuo , Ziyue Jiang , Dingdong Wang , Hanting Wang , Hai Huang , Zhou Zhao

Ultra-Low-Bitrate Speech Coding with Pretrained Transformers

Speech coding facilitates the transmission of speech over low-bandwidth networks with minimal distortion. Neural-network based speech codecs have recently demonstrated significant improvements in quality over traditional approaches. While…

Sound · Computer Science 2022-07-07 Ali Siahkoohi , Michael Chinen , Tom Denton , W. Bastiaan Kleijn , Jan Skoglund

FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks

Large language models have revolutionized natural language processing through self-supervised pretraining on massive datasets. Inspired by this success, researchers have explored adapting these methods to speech by discretizing continuous…

Machine Learning · Computer Science 2025-10-28 Luca Della Libera , Francesco Paissan , Cem Subakan , Mirco Ravanelli

SwitchCodec: A High-Fidelity Nerual Audio Codec With Sparse Quantization

Neural audio compression has emerged as a promising technology for efficiently representing speech, music, and general audio. However, existing methods suffer from significant performance degradation at limited bitrates, where the available…

Sound · Computer Science 2026-05-08 Jin Wang , Wenbin Jiang , Xiangbo Wang , Yubo You , Sheng Fang

LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models

We introduce LMCodec, a causal neural speech codec that provides high quality audio at very low bitrates. The backbone of the system is a causal convolutional codec that encodes audio into a hierarchy of coarse-to-fine tokens using residual…

Sound · Computer Science 2023-03-24 Teerapat Jenrungrot , Michael Chinen , W. Bastiaan Kleijn , Jan Skoglund , Zalán Borsos , Neil Zeghidour , Marco Tagliasacchi

SuperCodec: A Neural Speech Codec with Selective Back-Projection Network

Neural speech coding is a rapidly developing topic, where state-of-the-art approaches now exhibit superior compression performance than conventional methods. Despite significant progress, existing methods still have limitations in…

Sound · Computer Science 2024-07-31 Youqiang Zheng , Weiping Tu , Li Xiao , Xinmeng Xu

SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound

Large language models (LLMs) have significantly advanced audio processing through audio codecs that convert audio into discrete tokens, enabling the application of language modelling techniques to audio data. However, traditional codecs…

Sound · Computer Science 2024-12-02 Haohe Liu , Xuenan Xu , Yi Yuan , Mengyue Wu , Wenwu Wang , Mark D. Plumbley

DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation

Neural audio codecs form the foundational building blocks for language model (LM)-based speech generation. Typically, there is a trade-off between frame rate and audio quality. This study introduces a low-frame-rate, semantically enhanced…

Sound · Computer Science 2025-10-02 Jiaqi Li , Xiaolong Lin , Zhekai Li , Shixi Huang , Yuancheng Wang , Chaoren Wang , Zhenpeng Zhan , Zhizheng Wu

Fewer-token Neural Speech Codec with Time-invariant Codes

Language model based text-to-speech (TTS) models, like VALL-E, have gained attention for their outstanding in-context learning capability in zero-shot scenarios. Neural speech codec is a critical component of these models, which can convert…

Sound · Computer Science 2024-03-12 Yong Ren , Tao Wang , Jiangyan Yi , Le Xu , Jianhua Tao , Chuyuan Zhang , Junzuo Zhou