Related papers: Vector Quantized Diffusion Model Based Speech Band…

Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase Prediction

Speech bandwidth extension (BWE) refers to widening the frequency bandwidth range of speech signals, enhancing the speech quality towards brighter and fuller. This paper proposes a generative adversarial network (GAN) based BWE model with…

Audio and Speech Processing · Electrical Eng. & Systems 2024-12-17 Ye-Xin Lu , Yang Ai , Hui-Peng Du , Zhen-Hua Ling

Speech Enhancement Using Continuous Embeddings of Neural Audio Codec

Recent advancements in Neural Audio Codec (NAC) models have inspired their use in various speech processing tasks, including speech enhancement (SE). In this work, we propose a novel, efficient SE approach by leveraging the pre-quantization…

Audio and Speech Processing · Electrical Eng. & Systems 2025-03-18 Haoyang Li , Jia Qi Yip , Tianyu Fan , Eng Siong Chng

Noise-aware Speech Enhancement using Diffusion Probabilistic Model

With recent advances of diffusion model, generative speech enhancement (SE) has attracted a surge of research interest due to its great potential for unseen testing noises. However, existing efforts mainly focus on inherent properties of…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-05 Yuchen Hu , Chen Chen , Ruizhe Li , Qiushi Zhu , Eng Siong Chng

BAE-Net: A Low complexity and high fidelity Bandwidth-Adaptive neural network for speech super-resolution

Speech bandwidth extension (BWE) has demonstrated promising performance in enhancing the perceptual speech quality in real communication systems. Most existing BWE researches primarily focus on fixed upsampling ratios, disregarding the fact…

Sound · Computer Science 2023-12-22 Guochen Yu , Xiguang Zheng , Nan Li , Runqiang Han , Chengshi Zheng , Chen Zhang , Chao Zhou , Qi Huang , Bing Yu

Waveform Modeling and Generation Using Hierarchical Recurrent Neural Networks for Speech Bandwidth Extension

This paper presents a waveform modeling and generation method using hierarchical recurrent neural networks (HRNN) for speech bandwidth extension (BWE). Different from conventional BWE methods which predict spectral parameters for…

Sound · Computer Science 2018-01-26 Zhen-Hua Ling , Yang Ai , Yu Gu , Li-Rong Dai

CodecFlow: Efficient Bandwidth Extension via Conditional Flow Matching in Neural Codec Latent Space

Speech Bandwidth Extension improves clarity and intelligibility by restoring/inferring appropriate high-frequency content for low-bandwidth speech. Existing methods often rely on spectrogram or waveform modeling, which can incur higher…

Sound · Computer Science 2026-03-04 Bowen Zhang , Junchuan Zhao , Ian McLoughlin , Ye Wang , A S Madhukumar

Absorbing Discrete Diffusion for Speech Enhancement

Inspired by recent developments in neural speech coding and diffusion-based language modeling, we tackle speech enhancement by modeling the conditional distribution of clean speech codes given noisy speech codes using absorbing discrete…

Sound · Computer Science 2026-02-27 Philippe Gonzalez

UBGAN: Enhancing Coded Speech with Blind and Guided Bandwidth Extension

In practical application of speech codecs, a multitude of factors such as the quality of the radio connection, limiting hardware or required user experience necessitate trade-offs between achievable perceptual quality, engendered bitrate…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-25 Kishan Gupta , Srikanth Korse , Andreas Brendel , Nicola Pia , Guillaume Fuchs

SNAC: Multi-Scale Neural Audio Codec

Neural audio codecs have recently gained popularity because they can represent audio signals with high fidelity at very low bitrates, making it feasible to use language modeling approaches for audio generation and understanding. Residual…

Sound · Computer Science 2024-10-21 Hubert Siuzdak , Florian Grötschla , Luca A. Lanzendörfer

Exploring Disentangled Neural Speech Codecs from Self-Supervised Representations

Neural audio codecs (NACs), which use neural networks to generate compact audio representations, have garnered interest for their applicability to many downstream tasks -- especially quantized codecs due to their compatibility with large…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-13 Ryo Aihara , Yoshiki Masuyama , Gordon Wichern , François G. Germain , Jonathan Le Roux

Restoring degraded speech via a modified diffusion model

There are many deterministic mathematical operations (e.g. compression, clipping, downsampling) that degrade speech quality considerably. In this paper we introduce a neural network architecture, based on a modification of the DiffWave…

Sound · Computer Science 2021-09-03 Jianwei Zhang , Suren Jayasuriya , Visar Berisha

Speech bandwidth extension with WaveNet

Large-scale mobile communication systems tend to contain legacy transmission channels with narrowband bottlenecks, resulting in characteristic "telephone-quality" audio. While higher quality codecs exist, due to the scale and heterogeneity…

Audio and Speech Processing · Electrical Eng. & Systems 2019-07-12 Archit Gupta , Brendan Shillingford , Yannis Assael , Thomas C. Walters

Speaker anonymization using neural audio codec language models

The vast majority of approaches to speaker anonymization involve the extraction of fundamental frequency estimates, linguistic features and a speaker embedding which is perturbed to obfuscate the speaker identity before an anonymized speech…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-15 Michele Panariello , Francesco Nespoli , Massimiliano Todisco , Nicholas Evans

Pre-training Feature Guided Diffusion Model for Speech Enhancement

Speech enhancement significantly improves the clarity and intelligibility of speech in noisy environments, improving communication and listening experiences. In this paper, we introduce a novel pretraining feature-guided diffusion model…

Sound · Computer Science 2024-06-13 Yiyuan Yang , Niki Trigoni , Andrew Markham

Speech Enhancement with Multi-granularity Vector Quantization

With advances in deep learning, neural network based speech enhancement (SE) has developed rapidly in the last decade. Meanwhile, the self-supervised pre-trained model and vector quantization (VQ) have achieved excellent performance on many…

Audio and Speech Processing · Electrical Eng. & Systems 2023-02-17 Xiao-Ying Zhao , Qiu-Shi Zhu , Jie Zhang

Fast and Flexible Audio Bandwidth Extension via Vocos

We propose a Vocos-based bandwidth extension model that enhances audio at 8-48 kHz by generating missing high-frequency content. Inputs are resampled to 48 kHz and processed by a neural vocoder backbone, enabling a single network to support…

Audio and Speech Processing · Electrical Eng. & Systems 2026-03-10 Yatharth Sharma

NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization

Built upon vector quantization (VQ), discrete audio codec models have achieved great success in audio compression and auto-regressive audio generation. However, existing models face substantial challenges in perceptual quality and signal…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-20 Zhikang Niu , Sanyuan Chen , Long Zhou , Ziyang Ma , Xie Chen , Shujie Liu

DisContSE: Single-Step Diffusion Speech Enhancement Based on Joint Discrete and Continuous Embeddings

Diffusion speech enhancement on discrete audio codec features gain immense attention due to their improved speech component reconstruction capability. However, they usually suffer from high inference computational complexity due to multiple…

Audio and Speech Processing · Electrical Eng. & Systems 2026-01-30 Yihui Fu , Tim Fingscheidt

Multi-Stage Speech Bandwidth Extension with Flexible Sampling Rate Control

The majority of existing speech bandwidth extension (BWE) methods operate under the constraint of fixed source and target sampling rates, which limits their flexibility in practical applications. In this paper, we propose a multi-stage…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-05 Ye-Xin Lu , Yang Ai , Zheng-Yan Sheng , Zhen-Hua Ling

Harmonic-Percussive Disentangled Neural Audio Codec for Bandwidth Extension

Bandwidth extension, the task of reconstructing the high-frequency components of an audio signal from its low-pass counterpart, is a long-standing problem in audio processing. While traditional approaches have evolved alongside the broader…

Sound · Computer Science 2025-11-27 Benoît Giniès , Xiaoyu Bie , Olivier Fercoq , Gaël Richard