English
Related papers

Related papers: Rate-Aware Learned Speech Compression

200 papers

Neural speech codecs have demonstrated their ability to compress high-quality speech and audio by converting them into discrete token representations. Most existing methods utilize Residual Vector Quantization (RVQ) to encode speech into…

Sound · Computer Science 2024-10-22 Peiji Yang , Fengping Wang , Yicheng Zhong , Huawei Wei , Zhisheng Wang

Neural audio codecs discretize speech via residual vector quantization (RVQ), forming a coarse-to-fine hierarchy across quantizers. While codec models have been explored for representation learning, their discrete structure remains…

Sound · Computer Science 2026-03-19 Jinyang Wu , Zihan Pan , Qiquan Zhang , Sailor Hardik Bhupendra , Soumik Mondal

The ever-growing size of neural networks poses serious challenges on resource-constrained devices, such as embedded sensors. Compression algorithms that reduce their size can mitigate these problems, provided that model performance stays…

Machine Learning · Computer Science 2025-05-27 Alexander Conzelmann , Robert Bamler

While promising performance for speaker verification has been achieved by deep speaker embeddings, the advantage would reduce in the case of speaking-style variability. Speaking rate mismatch is often observed in practical speaker…

Audio and Speech Processing · Electrical Eng. & Systems 2022-05-31 Fuchuan Tong , Siqi Zheng , Haodong Zhou , Xingjia Xie , Qingyang Hong , Lin Li

Recent advancements in deep learning-based image compression are notable. However, prevalent schemes that employ a serial context-adaptive entropy model to enhance rate-distortion (R-D) performance are markedly slow. Furthermore, the…

Applications · Statistics 2024-03-25 Haisheng Fu , Feng Liang , Jie Liang , Zhenman Fang , Guohe Zhang , Jingning Han

This paper presents a new neural speech compression method that is practical in the sense that it operates at low bitrate, introduces a low latency, is compatible in computational complexity with current mobile devices, and provides a…

Audio and Speech Processing · Electrical Eng. & Systems 2022-03-10 Reza Lotfidereshgi , Philippe Gournay

Neural audio coding has emerged as a vivid research direction by promising good audio quality at very low bitrates unachievable by classical coding techniques. Here, end-to-end trainable autoencoder-like models represent the state of the…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-20 Andreas Brendel , Nicola Pia , Kishan Gupta , Lyonel Behringer , Guillaume Fuchs , Markus Multrus

Contemporary lossy image and video coding standards rely on transform coding, the process through which pixels are mapped to an alternative representation to facilitate efficient data compression. Despite impressive performance of…

Image and Video Processing · Electrical Eng. & Systems 2023-02-21 Lyndon R. Duong , Bohan Li , Cheng Chen , Jingning Han

Speech coding facilitates the transmission of speech over low-bandwidth networks with minimal distortion. Neural-network based speech codecs have recently demonstrated significant improvements in quality over traditional approaches. While…

Sound · Computer Science 2022-07-07 Ali Siahkoohi , Michael Chinen , Tom Denton , W. Bastiaan Kleijn , Jan Skoglund

Inspired by the success of deep neural networks (DNNs) in speech processing, this paper presents Deep Vocoder, a direct end-to-end low bit rate speech compression method with deep autoencoder (DAE). In Deep Vocoder, DAE is used for…

Multimedia · Computer Science 2019-05-15 Gang Min , Changqing Zhang , Xiongwei Zhang , Wei Tan

Residual Vector Quantization (RVQ) has become a dominant approach in neural speech and audio coding, providing high-fidelity compression. However, speech coding presents additional challenges due to real-world noise, which degrades…

Sound · Computer Science 2025-06-23 Yunkee Chae , Kyogu Lee

In learning-based approaches to image compression, codecs are developed by optimizing a computational model to minimize a rate-distortion objective. Currently, the most effective learned image codecs take the form of an entropy-constrained…

Image and Video Processing · Electrical Eng. & Systems 2020-07-20 David Minnen , Saurabh Singh

With recent rapid growth of large language models (LLMs), discrete speech tokenization has played an important role for injecting speech into LLMs. However, this discretization gives rise to a loss of information, consequently impairing…

Audio and Speech Processing · Electrical Eng. & Systems 2024-07-23 Zhichao Huang , Chutong Meng , Tom Ko

We propose a novel neural waveform compression method to catalyze emerging speech semantic communications. By introducing nonlinear transform and variational modeling, we effectively capture the dependencies within speech frames and…

Sound · Computer Science 2022-12-14 Shengshi Yao , Zixuan Xiao , Sixian Wang , Jincheng Dai , Kai Niu , Ping Zhang

Neural Video Compression (NVC) has achieved remarkable performance in recent years. However, precise rate control remains a challenge due to the inherent limitations of learning-based codecs. To solve this issue, we propose a dynamic video…

Computer Vision and Pattern Recognition · Computer Science 2025-08-29 Chenhao Zhang , Wei Gao

The neural radiance fields (NeRF) have advanced the development of 3D volumetric video technology, but the large data volumes they involve pose significant challenges for storage and transmission. To address these problems, the existing…

Multimedia · Computer Science 2024-11-11 Zhiyu Zhang , Guo Lu , Huanxiong Liang , Zhengxue Cheng , Anni Tang , Li Song

Discrete speech representation learning has recently attracted increasing interest in both acoustic and semantic modeling. Existing approaches typically encode 16 kHz waveforms into discrete tokens at a rate of 25 or 50 tokens per second.…

Computation and Language · Computer Science 2025-09-03 Jialong Zuo , Guangyan Zhang , Minghui Fang , Shengpeng Ji , Xiaoqi Jiao , Jingyu Li , Yiwen Guo , Zhou Zhao

Model compression has become an emerging need as the sizes of modern speech systems rapidly increase. In this paper, we study model weight quantization, which directly reduces the memory footprint to accommodate computationally…

In low-bitrate speech coding, end-to-end speech coding networks aim to learn compact yet expressive features and a powerful decoder in a single network. A challenging problem as such results in unwelcome complexity increase and inferior…

Audio and Speech Processing · Electrical Eng. & Systems 2023-11-16 Haici Yang , Inseon Jang , Minje Kim

The application of the context-adaptive entropy model significantly improves the rate-distortion (R-D) performance, in which hyperpriors and autoregressive models are jointly utilized to effectively capture the spatial redundancy of the…

Image and Video Processing · Electrical Eng. & Systems 2022-09-09 Haisheng Fu , Feng Liang
‹ Prev 1 2 3 10 Next ›