Related papers: RepCodec: A Speech Representation Codec for Speech…

SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound

Large language models (LLMs) have significantly advanced audio processing through audio codecs that convert audio into discrete tokens, enabling the application of language modelling techniques to audio data. However, traditional codecs…

Sound · Computer Science 2024-12-02 Haohe Liu , Xuenan Xu , Yi Yuan , Mengyue Wu , Wenwu Wang , Mark D. Plumbley

Optimizing Neural Speech Codec for Low-Bitrate Compression via Multi-Scale Encoding

Neural speech codecs have demonstrated their ability to compress high-quality speech and audio by converting them into discrete token representations. Most existing methods utilize Residual Vector Quantization (RVQ) to encode speech into…

Sound · Computer Science 2024-10-22 Peiji Yang , Fengping Wang , Yicheng Zhong , Huawei Wei , Zhisheng Wang

Speech Codec Probing from Semantic and Phonetic Perspectives

Speech tokenizers are essential for connecting speech to large language models (LLMs) in multimodal systems. These tokenizers are expected to preserve both semantic and acoustic information for downstream understanding and generation.…

Audio and Speech Processing · Electrical Eng. & Systems 2026-03-12 Xuan Shi , Chang Zeng , Tiantian Feng , Shih-Heng Wang , Jianbo Ma , Shrikanth Narayanan

Language-Codec: Bridging Discrete Codec Representations and Speech Language Models

In recent years, large language models have achieved significant success in generative tasks related to speech, audio, music, and other signal domains. A crucial element of these models is the discrete acoustic codecs, which serve as an…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-05 Shengpeng Ji , Minghui Fang , Jialong Zuo , Ziyue Jiang , Dingdong Wang , Hanting Wang , Hai Huang , Zhou Zhao

DM-Codec: Distilling Multimodal Representations for Speech Tokenization

Recent advancements in speech-language models have yielded significant improvements in speech tokenization and synthesis. However, effectively mapping the complex, multidimensional attributes of speech into discrete tokens remains…

Computation and Language · Computer Science 2025-09-30 Md Mubtasim Ahasan , Md Fahim , Tasnim Mohiuddin , A K M Mahbubur Rahman , Aman Chadha , Tariq Iqbal , M Ashraful Amin , Md Mofijul Islam , Amin Ahsan Ali

Vec-Tok Speech: speech vectorization and tokenization for neural speech generation

Language models (LMs) have recently flourished in natural language processing and computer vision, generating high-fidelity texts or images in various tasks. In contrast, the current speech generative models are still struggling regarding…

Sound · Computer Science 2023-10-13 Xinfa Zhu , Yuanjun Lv , Yi Lei , Tao Li , Wendi He , Hongbin Zhou , Heng Lu , Lei Xie

FreeCodec: A disentangled neural speech codec with fewer tokens

Neural speech codecs have gained great attention for their outstanding reconstruction with discrete token representations. It is a crucial component in generative tasks such as speech coding and large language models (LLM). However, most…

Sound · Computer Science 2025-07-01 Youqiang Zheng , Weiping Tu , Yueteng Kang , Jie Chen , Yike Zhang , Li Xiao , Yuhong Yang , Long Ma

Semantic Codebooks as Effective Priors for Neural Speech Compression

Speech codecs are traditionally optimized for waveform fidelity, allocating bits to preserve acoustic detail even when much of it can be inferred from linguistic structure. This leads to inefficient compression and suboptimal performance on…

Sound · Computer Science 2025-12-29 Liuyang Bai , Weiyi Lu , Li Guo

Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations

Current large speech language models are mainly based on semantic tokens from discretization of self-supervised learned representations and acoustic tokens from a neural codec, following a semantic-modeling and acoustic-synthesis paradigm.…

Sound · Computer Science 2025-10-16 Xue Jiang , Xiulian Peng , Yuan Zhang , Yan Lu

SECodec: Structural Entropy-based Compressive Speech Representation Codec for Speech Language Models

With the rapid advancement of large language models (LLMs), discrete speech representations have become crucial for integrating speech into LLMs. Existing methods for speech representation discretization rely on a predefined codebook size…

Sound · Computer Science 2025-01-03 Linqin Wang , Yaping Liu , Zhengtao Yu , Shengxiang Gao , Cunli Mao , Yuxin Huang , Wenjun Wang , Ling Dong

Speech Tokenizer is Key to Consistent Representation

Speech tokenization is crucial in digital speech processing, converting continuous speech signals into discrete units for various computational tasks. This paper introduces a novel speech tokenizer with broad applicability across downstream…

Machine Learning · Computer Science 2025-07-10 Wonjin Jung , Sungil Kang , Dong-Yeon Cho

HH-Codec: High Compression High-fidelity Discrete Neural Codec for Spoken Language Modeling

Discrete speech tokenization is a fundamental component in speech codecs. However, in large-scale speech-to-speech systems, the complexity of parallel streams from multiple quantizers and the computational cost of high-time-dimensional…

Sound · Computer Science 2025-07-28 Rongkun Xue , Yazhe Niu , Shuai Hu , Zixin Yin , Yongqiang Yao , Jing Yang

MBCodec:Thorough disentangle for high-fidelity audio compression

High-fidelity neural audio codecs in Text-to-speech (TTS) aim to compress speech signals into discrete representations for faithful reconstruction. However, prior approaches faced challenges in effectively disentangling acoustic and…

Sound · Computer Science 2025-09-23 Ruonan Zhang , Xiaoyang Hao , Yichen Han , Junjie Cao , Yue Liu , Kai Zhang

SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models

Current speech large language models build upon discrete speech representations, which can be categorized into semantic tokens and acoustic tokens. However, existing speech tokens are not specifically designed for speech language modeling.…

Computation and Language · Computer Science 2024-01-24 Xin Zhang , Dong Zhang , Shimin Li , Yaqian Zhou , Xipeng Qiu

FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks

Large language models have revolutionized natural language processing through self-supervised pretraining on massive datasets. Inspired by this success, researchers have explored adapting these methods to speech by discretizing continuous…

Machine Learning · Computer Science 2025-10-28 Luca Della Libera , Francesco Paissan , Cem Subakan , Mirco Ravanelli

SoCodec: A Semantic-Ordered Multi-Stream Speech Codec for Efficient Language Model Based Text-to-Speech Synthesis

The long speech sequence has been troubling language models (LM) based TTS approaches in terms of modeling complexity and efficiency. This work proposes SoCodec, a semantic-ordered multi-stream speech codec, to address this issue. It…

Sound · Computer Science 2024-09-04 Haohan Guo , Fenglong Xie , Kun Xie , Dongchao Yang , Dake Guo , Xixin Wu , Helen Meng

Exploring the Effect of Segmentation and Vocabulary Size on Speech Tokenization for Speech Language Models

The purpose of speech tokenization is to transform a speech signal into a sequence of discrete representations, serving as the foundation for speech language models (SLMs). While speech tokenization has many options, their effect on the…

Computation and Language · Computer Science 2025-06-03 Shunsuke Kando , Yusuke Miyao , Shinnosuke Takamichi

Speech Token Prediction via Compressed-to-fine Language Modeling for Speech Generation

Neural audio codecs, used as speech tokenizers, have demonstrated remarkable potential in the field of speech generation. However, to ensure high-fidelity audio reconstruction, neural audio codecs typically encode audio into long sequences…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-02 Wenrui Liu , Qian Chen , Wen Wang , Yafeng Chen , Jin Xu , Zhifang Guo , Guanrou Yang , Weiqin Li , Xiaoda Yang , Tao Jin , Minghui Fang , Jialong Zuo , Bai Jionghao , Zemin Liu

dMel: Speech Tokenization made Simple

Large language models have revolutionized natural language processing by leveraging self-supervised pretraining on vast textual data. Inspired by this success, researchers have investigated various compression-based speech tokenization…

Computation and Language · Computer Science 2025-05-22 Richard He Bai , Tatiana Likhomanenko , Ruixiang Zhang , Zijin Gu , Zakaria Aldeneh , Navdeep Jaitly

ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling

Recent advancements in audio language models have underscored the pivotal role of audio tokenization, which converts audio signals into discrete tokens, thereby facilitating the application of language model architectures to the audio…

Sound · Computer Science 2025-04-15 Dongchao Yang , Songxiang Liu , Haohan Guo , Jiankun Zhao , Yuanyuan Wang , Helin Wang , Zeqian Ju , Xubo Liu , Xueyuan Chen , Xu Tan , Xixin Wu , Helen Meng