English
Related papers

Related papers: RepCodec: A Speech Representation Codec for Speech…

200 papers

Large language models (LLMs) have significantly advanced audio processing through audio codecs that convert audio into discrete tokens, enabling the application of language modelling techniques to audio data. However, traditional codecs…

Sound · Computer Science 2024-12-02 Haohe Liu , Xuenan Xu , Yi Yuan , Mengyue Wu , Wenwu Wang , Mark D. Plumbley

Neural speech codecs have demonstrated their ability to compress high-quality speech and audio by converting them into discrete token representations. Most existing methods utilize Residual Vector Quantization (RVQ) to encode speech into…

Sound · Computer Science 2024-10-22 Peiji Yang , Fengping Wang , Yicheng Zhong , Huawei Wei , Zhisheng Wang

Speech tokenizers are essential for connecting speech to large language models (LLMs) in multimodal systems. These tokenizers are expected to preserve both semantic and acoustic information for downstream understanding and generation.…

Audio and Speech Processing · Electrical Eng. & Systems 2026-03-12 Xuan Shi , Chang Zeng , Tiantian Feng , Shih-Heng Wang , Jianbo Ma , Shrikanth Narayanan

In recent years, large language models have achieved significant success in generative tasks related to speech, audio, music, and other signal domains. A crucial element of these models is the discrete acoustic codecs, which serve as an…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-05 Shengpeng Ji , Minghui Fang , Jialong Zuo , Ziyue Jiang , Dingdong Wang , Hanting Wang , Hai Huang , Zhou Zhao

Recent advancements in speech-language models have yielded significant improvements in speech tokenization and synthesis. However, effectively mapping the complex, multidimensional attributes of speech into discrete tokens remains…

Language models (LMs) have recently flourished in natural language processing and computer vision, generating high-fidelity texts or images in various tasks. In contrast, the current speech generative models are still struggling regarding…

Sound · Computer Science 2023-10-13 Xinfa Zhu , Yuanjun Lv , Yi Lei , Tao Li , Wendi He , Hongbin Zhou , Heng Lu , Lei Xie

Neural speech codecs have gained great attention for their outstanding reconstruction with discrete token representations. It is a crucial component in generative tasks such as speech coding and large language models (LLM). However, most…

Sound · Computer Science 2025-07-01 Youqiang Zheng , Weiping Tu , Yueteng Kang , Jie Chen , Yike Zhang , Li Xiao , Yuhong Yang , Long Ma

Speech codecs are traditionally optimized for waveform fidelity, allocating bits to preserve acoustic detail even when much of it can be inferred from linguistic structure. This leads to inefficient compression and suboptimal performance on…

Sound · Computer Science 2025-12-29 Liuyang Bai , Weiyi Lu , Li Guo

Current large speech language models are mainly based on semantic tokens from discretization of self-supervised learned representations and acoustic tokens from a neural codec, following a semantic-modeling and acoustic-synthesis paradigm.…

Sound · Computer Science 2025-10-16 Xue Jiang , Xiulian Peng , Yuan Zhang , Yan Lu

With the rapid advancement of large language models (LLMs), discrete speech representations have become crucial for integrating speech into LLMs. Existing methods for speech representation discretization rely on a predefined codebook size…

Sound · Computer Science 2025-01-03 Linqin Wang , Yaping Liu , Zhengtao Yu , Shengxiang Gao , Cunli Mao , Yuxin Huang , Wenjun Wang , Ling Dong

Speech tokenization is crucial in digital speech processing, converting continuous speech signals into discrete units for various computational tasks. This paper introduces a novel speech tokenizer with broad applicability across downstream…

Machine Learning · Computer Science 2025-07-10 Wonjin Jung , Sungil Kang , Dong-Yeon Cho

Discrete speech tokenization is a fundamental component in speech codecs. However, in large-scale speech-to-speech systems, the complexity of parallel streams from multiple quantizers and the computational cost of high-time-dimensional…

Sound · Computer Science 2025-07-28 Rongkun Xue , Yazhe Niu , Shuai Hu , Zixin Yin , Yongqiang Yao , Jing Yang

High-fidelity neural audio codecs in Text-to-speech (TTS) aim to compress speech signals into discrete representations for faithful reconstruction. However, prior approaches faced challenges in effectively disentangling acoustic and…

Sound · Computer Science 2025-09-23 Ruonan Zhang , Xiaoyang Hao , Yichen Han , Junjie Cao , Yue Liu , Kai Zhang

Current speech large language models build upon discrete speech representations, which can be categorized into semantic tokens and acoustic tokens. However, existing speech tokens are not specifically designed for speech language modeling.…

Computation and Language · Computer Science 2024-01-24 Xin Zhang , Dong Zhang , Shimin Li , Yaqian Zhou , Xipeng Qiu

Large language models have revolutionized natural language processing through self-supervised pretraining on massive datasets. Inspired by this success, researchers have explored adapting these methods to speech by discretizing continuous…

Machine Learning · Computer Science 2025-10-28 Luca Della Libera , Francesco Paissan , Cem Subakan , Mirco Ravanelli

The long speech sequence has been troubling language models (LM) based TTS approaches in terms of modeling complexity and efficiency. This work proposes SoCodec, a semantic-ordered multi-stream speech codec, to address this issue. It…

Sound · Computer Science 2024-09-04 Haohan Guo , Fenglong Xie , Kun Xie , Dongchao Yang , Dake Guo , Xixin Wu , Helen Meng

The purpose of speech tokenization is to transform a speech signal into a sequence of discrete representations, serving as the foundation for speech language models (SLMs). While speech tokenization has many options, their effect on the…

Computation and Language · Computer Science 2025-06-03 Shunsuke Kando , Yusuke Miyao , Shinnosuke Takamichi

Neural audio codecs, used as speech tokenizers, have demonstrated remarkable potential in the field of speech generation. However, to ensure high-fidelity audio reconstruction, neural audio codecs typically encode audio into long sequences…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-02 Wenrui Liu , Qian Chen , Wen Wang , Yafeng Chen , Jin Xu , Zhifang Guo , Guanrou Yang , Weiqin Li , Xiaoda Yang , Tao Jin , Minghui Fang , Jialong Zuo , Bai Jionghao , Zemin Liu

Large language models have revolutionized natural language processing by leveraging self-supervised pretraining on vast textual data. Inspired by this success, researchers have investigated various compression-based speech tokenization…

Computation and Language · Computer Science 2025-05-22 Richard He Bai , Tatiana Likhomanenko , Ruixiang Zhang , Zijin Gu , Zakaria Aldeneh , Navdeep Jaitly

Recent advancements in audio language models have underscored the pivotal role of audio tokenization, which converts audio signals into discrete tokens, thereby facilitating the application of language model architectures to the audio…

‹ Prev 1 2 3 10 Next ›