Related papers: Rate-Aware Learned Speech Compression

Optimizing Neural Speech Codec for Low-Bitrate Compression via Multi-Scale Encoding

Neural speech codecs have demonstrated their ability to compress high-quality speech and audio by converting them into discrete token representations. Most existing methods utilize Residual Vector Quantization (RVQ) to encode speech into…

Sound · Computer Science 2024-10-22 Peiji Yang , Fengping Wang , Yicheng Zhong , Huawei Wei , Zhisheng Wang

Quantizer-Aware Hierarchical Neural Codec Modeling for Speech Deepfake Detection

Neural audio codecs discretize speech via residual vector quantization (RVQ), forming a coarse-to-fine hierarchy across quantizers. While codec models have been explored for representation learning, their discrete structure remains…

Sound · Computer Science 2026-03-19 Jinyang Wu , Zihan Pan , Qiquan Zhang , Sailor Hardik Bhupendra , Soumik Mondal

Reducing Storage of Pretrained Neural Networks by Rate-Constrained Quantization and Entropy Coding

The ever-growing size of neural networks poses serious challenges on resource-constrained devices, such as embedded sensors. Compression algorithms that reduce their size can mitigate these problems, provided that model performance stays…

Machine Learning · Computer Science 2025-05-27 Alexander Conzelmann , Robert Bamler

Deep Representation Decomposition for Rate-Invariant Speaker Verification

While promising performance for speaker verification has been achieved by deep speaker embeddings, the advantage would reduce in the case of speaking-style variability. Speaking rate mismatch is often observed in practical speaker…

Audio and Speech Processing · Electrical Eng. & Systems 2022-05-31 Fuchuan Tong , Siqi Zheng , Haodong Zhou , Xingjia Xie , Qingyang Hong , Lin Li

Learned Image Compression with Dual-Branch Encoder and Conditional Information Coding

Recent advancements in deep learning-based image compression are notable. However, prevalent schemes that employ a serial context-adaptive entropy model to enhance rate-distortion (R-D) performance are markedly slow. Furthermore, the…

Applications · Statistics 2024-03-25 Haisheng Fu , Feng Liang , Jie Liang , Zhenman Fang , Guohe Zhang , Jingning Han

Practical cognitive speech compression

This paper presents a new neural speech compression method that is practical in the sense that it operates at low bitrate, introduces a low latency, is compatible in computational complexity with current mobile devices, and provides a…

Audio and Speech Processing · Electrical Eng. & Systems 2022-03-10 Reza Lotfidereshgi , Philippe Gournay

Neural Speech Coding for Real-time Communications using Constant Bitrate Scalar Quantization

Neural audio coding has emerged as a vivid research direction by promising good audio quality at very low bitrates unachievable by classical coding techniques. Here, end-to-end trainable autoencoder-like models represent the state of the…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-20 Andreas Brendel , Nicola Pia , Kishan Gupta , Lyonel Behringer , Guillaume Fuchs , Markus Multrus

Multi-rate adaptive transform coding for video compression

Contemporary lossy image and video coding standards rely on transform coding, the process through which pixels are mapped to an alternative representation to facilitate efficient data compression. Despite impressive performance of…

Image and Video Processing · Electrical Eng. & Systems 2023-02-21 Lyndon R. Duong , Bohan Li , Cheng Chen , Jingning Han

Ultra-Low-Bitrate Speech Coding with Pretrained Transformers

Speech coding facilitates the transmission of speech over low-bandwidth networks with minimal distortion. Neural-network based speech codecs have recently demonstrated significant improvements in quality over traditional approaches. While…

Sound · Computer Science 2022-07-07 Ali Siahkoohi , Michael Chinen , Tom Denton , W. Bastiaan Kleijn , Jan Skoglund

Deep Vocoder: Low Bit Rate Compression of Speech with Deep Autoencoder

Inspired by the success of deep neural networks (DNNs) in speech processing, this paper presents Deep Vocoder, a direct end-to-end low bit rate speech compression method with deep autoencoder (DAE). In Deep Vocoder, DAE is used for…

Multimedia · Computer Science 2019-05-15 Gang Min , Changqing Zhang , Xiongwei Zhang , Wei Tan

Towards Bitrate-Efficient and Noise-Robust Speech Coding with Variable Bitrate RVQ

Residual Vector Quantization (RVQ) has become a dominant approach in neural speech and audio coding, providing high-fidelity compression. However, speech coding presents additional challenges due to real-world noise, which degrades…

Sound · Computer Science 2025-06-23 Yunkee Chae , Kyogu Lee

Channel-wise Autoregressive Entropy Models for Learned Image Compression

In learning-based approaches to image compression, codecs are developed by optimizing a computational model to minimize a rate-distortion objective. Currently, the most effective learned image codecs take the form of an entropy-constrained…

Image and Video Processing · Electrical Eng. & Systems 2020-07-20 David Minnen , Saurabh Singh

RepCodec: A Speech Representation Codec for Speech Tokenization

With recent rapid growth of large language models (LLMs), discrete speech tokenization has played an important role for injecting speech into LLMs. However, this discretization gives rise to a loss of information, consequently impairing…

Audio and Speech Processing · Electrical Eng. & Systems 2024-07-23 Zhichao Huang , Chutong Meng , Tom Ko

Variational Speech Waveform Compression to Catalyze Semantic Communications

We propose a novel neural waveform compression method to catalyze emerging speech semantic communications. By introducing nonlinear transform and variational modeling, we effectively capture the dependencies within speech frames and…

Sound · Computer Science 2022-12-14 Shengshi Yao , Zixuan Xiao , Sixian Wang , Jincheng Dai , Kai Niu , Ping Zhang

Learned Rate Control for Frame-Level Adaptive Neural Video Compression via Dynamic Neural Network

Neural Video Compression (NVC) has achieved remarkable performance in recent years. However, precise rate control remains a challenge due to the inherent limitations of learning-based codecs. To solve this issue, we propose a dynamic video…

Computer Vision and Pattern Recognition · Computer Science 2025-08-29 Chenhao Zhang , Wei Gao

Rate-aware Compression for NeRF-based Volumetric Video

The neural radiance fields (NeRF) have advanced the development of 3D volumetric video technology, but the large data volumes they involve pose significant challenges for storage and transmission. To address these problems, the existing…

Multimedia · Computer Science 2024-11-11 Zhiyu Zhang , Guo Lu , Huanxiong Liang , Zhengxue Cheng , Anni Tang , Li Song

Entropy-based Coarse and Compressed Semantic Speech Representation Learning

Discrete speech representation learning has recently attracted increasing interest in both acoustic and semantic modeling. Existing approaches typically encode 16 kHz waveforms into discrete tokens at a rate of 25 or 50 tokens per second.…

Computation and Language · Computer Science 2025-09-03 Jialong Zuo , Guangyan Zhang , Minghui Fang , Shengpeng Ji , Xiaoqi Jiao , Jingyu Li , Yiwen Guo , Zhou Zhao

Towards One-bit ASR: Extremely Low-bit Conformer Quantization Using Co-training and Stochastic Precision

Model compression has become an emerging need as the sizes of modern speech systems rapidly increase. In this paper, we study model weight quantization, which directly reduces the memory footprint to accommodate computationally…

Sound · Computer Science 2025-05-28 Zhaoqing Li , Haoning Xu , Zengrui Jin , Lingwei Meng , Tianzi Wang , Huimeng Wang , Youjun Chen , Mingyu Cui , Shujie Hu , Xunying Liu

Generative De-Quantization for Neural Speech Codec via Latent Diffusion

In low-bitrate speech coding, end-to-end speech coding networks aim to learn compact yet expressive features and a powerful decoder in a single network. A challenging problem as such results in unwelcome complexity increase and inferior…

Audio and Speech Processing · Electrical Eng. & Systems 2023-11-16 Haici Yang , Inseon Jang , Minje Kim

Learned Image Compression with Generalized Octave Convolution and Cross-Resolution Parameter Estimation

The application of the context-adaptive entropy model significantly improves the rate-distortion (R-D) performance, in which hyperpriors and autoregressive models are jointly utilized to effectively capture the spatial redundancy of the…

Image and Video Processing · Electrical Eng. & Systems 2022-09-09 Haisheng Fu , Feng Liang