Related papers: Practical cognitive speech compression

A Novel Semantic Compression Approach for Ultra-low Bandwidth Voice Communication

While existing speech audio codecs designed for compression exploit limited forms of temporal redundancy and allow for multi-scale representations, they tend to represent all features of audio in the same way. In contrast, generative voice…

Sound · Computer Science 2025-09-22 Ryan Collette , Ross Greenwood , Serena Nicoll

A High Fidelity and Low Complexity Neural Audio Coding

Audio coding is an essential module in the real-time communication system. Neural audio codecs can compress audio samples with a low bitrate due to the strong modeling and generative capabilities of deep neural networks. To address the poor…

Sound · Computer Science 2023-10-18 Wenzhe Liu , Wei Xiao , Meng Wang , Shan Yang , Yupeng Shi , Yuyong Kang , Dan Su , Shidong Shang , Dong Yu

Ultra-Low-Bitrate Speech Coding with Pretrained Transformers

Speech coding facilitates the transmission of speech over low-bandwidth networks with minimal distortion. Neural-network based speech codecs have recently demonstrated significant improvements in quality over traditional approaches. While…

Sound · Computer Science 2022-07-07 Ali Siahkoohi , Michael Chinen , Tom Denton , W. Bastiaan Kleijn , Jan Skoglund

Optimizing Neural Speech Codec for Low-Bitrate Compression via Multi-Scale Encoding

Neural speech codecs have demonstrated their ability to compress high-quality speech and audio by converting them into discrete token representations. Most existing methods utilize Residual Vector Quantization (RVQ) to encode speech into…

Sound · Computer Science 2024-10-22 Peiji Yang , Fengping Wang , Yicheng Zhong , Huawei Wei , Zhisheng Wang

Variational Speech Waveform Compression to Catalyze Semantic Communications

We propose a novel neural waveform compression method to catalyze emerging speech semantic communications. By introducing nonlinear transform and variational modeling, we effectively capture the dependencies within speech frames and…

Sound · Computer Science 2022-12-14 Shengshi Yao , Zixuan Xiao , Sixian Wang , Jincheng Dai , Kai Niu , Ping Zhang

RADE: A Neural Codec for Transmitting Speech over HF Radio Channels

Speech compression is commonly used to send voice over radio channels in applications such as mobile telephony and two-way push-to-talk (PTT) radio. In classical systems, the speech codec is combined with forward error correction,…

Audio and Speech Processing · Electrical Eng. & Systems 2025-07-29 David Rowe , Jean-Marc Valin

Rate-Aware Learned Speech Compression

The rapid rise of real-time communication and large language models has significantly increased the importance of speech compression. Deep learning-based neural speech codecs have outperformed traditional signal-level speech codecs in terms…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-22 Jun Xu , Zhengxue Cheng , Guangchuan Chi , Yuhan Liu , Yuelin Hu , Li Song

Speech Quality Factors for Traditional and Neural-Based Low Bit Rate Vocoders

This study compares the performances of different algorithms for coding speech at low bit rates. In addition to widely deployed traditional vocoders, a selection of recently developed generative-model-based coders at different bit rates are…

Audio and Speech Processing · Electrical Eng. & Systems 2020-03-27 Wissam A. Jassim , Jan Skoglund , Michael Chinen , Andrew Hines

End-to-End Optimized Speech Coding with Deep Neural Networks

Modern compression algorithms are often the result of laborious domain-specific research; industry standards such as MP3, JPEG, and AMR-WB took years to develop and were largely hand-designed. We present a deep neural network model which…

Sound · Computer Science 2021-07-09 Srihari Kankanahalli

Cognitive Coding of Speech

We propose an approach for cognitive coding of speech by unsupervised extraction of contextual representations in two hierarchical levels of abstraction. Speech attributes such as phoneme identity that last one hundred milliseconds or less…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-11 Reza Lotfidereshgi , Philippe Gournay

Analysis by Adversarial Synthesis -- A Novel Approach for Speech Vocoding

Classical parametric speech coding techniques provide a compact representation for speech signals. This affords a very low transmission rate but with a reduced perceptual quality of the reconstructed signals. Recently, autoregressive deep…

Audio and Speech Processing · Electrical Eng. & Systems 2019-07-02 Ahmed Mustafa , Arijit Biswas , Christian Bergler , Julia Schottenhamml , Andreas Maier

A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet

Neural speech synthesis algorithms are a promising new approach for coding speech at very low bitrate. They have so far demonstrated quality that far exceeds traditional vocoders, at the cost of very high complexity. In this work, we…

Audio and Speech Processing · Electrical Eng. & Systems 2019-07-01 Jean-Marc Valin , Jan Skoglund

Scalable and Efficient Neural Speech Coding: A Hybrid Design

We present a scalable and efficient neural waveform coding system for speech compression. We formulate the speech coding problem as an autoencoding task, where a convolutional neural network (CNN) performs encoding and decoding as a neural…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-30 Kai Zhen , Jongmo Sung , Mi Suk Lee , Seungkwon Beak , Minje Kim

Speech Separation using Neural Audio Codecs with Embedding Loss

Neural audio codecs have revolutionized audio processing by enabling speech tasks to be performed on highly compressed representations. Recent work has shown that speech separation can be achieved within these compressed domains, offering…

Audio and Speech Processing · Electrical Eng. & Systems 2024-11-28 Jia Qi Yip , Chin Yuen Kwok , Bin Ma , Eng Siong Chng

Audio-Visual Driven Compression for Low-Bitrate Talking Head Videos

Talking head video compression has advanced with neural rendering and keypoint-based methods, but challenges remain, especially at low bit rates, including handling large head movements, suboptimal lip synchronization, and distorted facial…

Image and Video Processing · Electrical Eng. & Systems 2025-06-17 Riku Takahashi , Ryugo Morita , Jinjia Zhou

Deep Vocoder: Low Bit Rate Compression of Speech with Deep Autoencoder

Inspired by the success of deep neural networks (DNNs) in speech processing, this paper presents Deep Vocoder, a direct end-to-end low bit rate speech compression method with deep autoencoder (DAE). In Deep Vocoder, DAE is used for…

Multimedia · Computer Science 2019-05-15 Gang Min , Changqing Zhang , Xiongwei Zhang , Wei Tan

VoCodec: An Efficient Lightweight Low-Bitrate Speech Codec

Recent advancements in end-to-end neural speech codecs enable compressing audio at extremely low bitrates while maintaining high-fidelity reconstruction. Meanwhile, low computational complexity and low latency are crucial for real-time…

Audio and Speech Processing · Electrical Eng. & Systems 2026-01-21 Leyan Yang , Ronghui Hu , Yang Xu , Jing Lu

HH-Codec: High Compression High-fidelity Discrete Neural Codec for Spoken Language Modeling

Discrete speech tokenization is a fundamental component in speech codecs. However, in large-scale speech-to-speech systems, the complexity of parallel streams from multiple quantizers and the computational cost of high-time-dimensional…

Sound · Computer Science 2025-07-28 Rongkun Xue , Yazhe Niu , Shuai Hu , Zixin Yin , Yongqiang Yao , Jing Yang

SuperCodec: A Neural Speech Codec with Selective Back-Projection Network

Neural speech coding is a rapidly developing topic, where state-of-the-art approaches now exhibit superior compression performance than conventional methods. Despite significant progress, existing methods still have limitations in…

Sound · Computer Science 2024-07-31 Youqiang Zheng , Weiping Tu , Li Xiao , Xinmeng Xu

Neural Speech Coding for Real-time Communications using Constant Bitrate Scalar Quantization

Neural audio coding has emerged as a vivid research direction by promising good audio quality at very low bitrates unachievable by classical coding techniques. Here, end-to-end trainable autoencoder-like models represent the state of the…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-20 Andreas Brendel , Nicola Pia , Kishan Gupta , Lyonel Behringer , Guillaume Fuchs , Markus Multrus