English
Related papers

Related papers: Vector Quantized Diffusion Model Based Speech Band…

200 papers

Speech bandwidth extension (BWE) refers to widening the frequency bandwidth range of speech signals, enhancing the speech quality towards brighter and fuller. This paper proposes a generative adversarial network (GAN) based BWE model with…

Audio and Speech Processing · Electrical Eng. & Systems 2024-12-17 Ye-Xin Lu , Yang Ai , Hui-Peng Du , Zhen-Hua Ling

Recent advancements in Neural Audio Codec (NAC) models have inspired their use in various speech processing tasks, including speech enhancement (SE). In this work, we propose a novel, efficient SE approach by leveraging the pre-quantization…

Audio and Speech Processing · Electrical Eng. & Systems 2025-03-18 Haoyang Li , Jia Qi Yip , Tianyu Fan , Eng Siong Chng

With recent advances of diffusion model, generative speech enhancement (SE) has attracted a surge of research interest due to its great potential for unseen testing noises. However, existing efforts mainly focus on inherent properties of…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-05 Yuchen Hu , Chen Chen , Ruizhe Li , Qiushi Zhu , Eng Siong Chng

Speech bandwidth extension (BWE) has demonstrated promising performance in enhancing the perceptual speech quality in real communication systems. Most existing BWE researches primarily focus on fixed upsampling ratios, disregarding the fact…

Sound · Computer Science 2023-12-22 Guochen Yu , Xiguang Zheng , Nan Li , Runqiang Han , Chengshi Zheng , Chen Zhang , Chao Zhou , Qi Huang , Bing Yu

This paper presents a waveform modeling and generation method using hierarchical recurrent neural networks (HRNN) for speech bandwidth extension (BWE). Different from conventional BWE methods which predict spectral parameters for…

Sound · Computer Science 2018-01-26 Zhen-Hua Ling , Yang Ai , Yu Gu , Li-Rong Dai

Speech Bandwidth Extension improves clarity and intelligibility by restoring/inferring appropriate high-frequency content for low-bandwidth speech. Existing methods often rely on spectrogram or waveform modeling, which can incur higher…

Sound · Computer Science 2026-03-04 Bowen Zhang , Junchuan Zhao , Ian McLoughlin , Ye Wang , A S Madhukumar

Inspired by recent developments in neural speech coding and diffusion-based language modeling, we tackle speech enhancement by modeling the conditional distribution of clean speech codes given noisy speech codes using absorbing discrete…

Sound · Computer Science 2026-02-27 Philippe Gonzalez

In practical application of speech codecs, a multitude of factors such as the quality of the radio connection, limiting hardware or required user experience necessitate trade-offs between achievable perceptual quality, engendered bitrate…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-25 Kishan Gupta , Srikanth Korse , Andreas Brendel , Nicola Pia , Guillaume Fuchs

Neural audio codecs have recently gained popularity because they can represent audio signals with high fidelity at very low bitrates, making it feasible to use language modeling approaches for audio generation and understanding. Residual…

Sound · Computer Science 2024-10-21 Hubert Siuzdak , Florian Grötschla , Luca A. Lanzendörfer

Neural audio codecs (NACs), which use neural networks to generate compact audio representations, have garnered interest for their applicability to many downstream tasks -- especially quantized codecs due to their compatibility with large…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-13 Ryo Aihara , Yoshiki Masuyama , Gordon Wichern , François G. Germain , Jonathan Le Roux

There are many deterministic mathematical operations (e.g. compression, clipping, downsampling) that degrade speech quality considerably. In this paper we introduce a neural network architecture, based on a modification of the DiffWave…

Sound · Computer Science 2021-09-03 Jianwei Zhang , Suren Jayasuriya , Visar Berisha

Large-scale mobile communication systems tend to contain legacy transmission channels with narrowband bottlenecks, resulting in characteristic "telephone-quality" audio. While higher quality codecs exist, due to the scale and heterogeneity…

Audio and Speech Processing · Electrical Eng. & Systems 2019-07-12 Archit Gupta , Brendan Shillingford , Yannis Assael , Thomas C. Walters

The vast majority of approaches to speaker anonymization involve the extraction of fundamental frequency estimates, linguistic features and a speaker embedding which is perturbed to obfuscate the speaker identity before an anonymized speech…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-15 Michele Panariello , Francesco Nespoli , Massimiliano Todisco , Nicholas Evans

Speech enhancement significantly improves the clarity and intelligibility of speech in noisy environments, improving communication and listening experiences. In this paper, we introduce a novel pretraining feature-guided diffusion model…

Sound · Computer Science 2024-06-13 Yiyuan Yang , Niki Trigoni , Andrew Markham

With advances in deep learning, neural network based speech enhancement (SE) has developed rapidly in the last decade. Meanwhile, the self-supervised pre-trained model and vector quantization (VQ) have achieved excellent performance on many…

Audio and Speech Processing · Electrical Eng. & Systems 2023-02-17 Xiao-Ying Zhao , Qiu-Shi Zhu , Jie Zhang

We propose a Vocos-based bandwidth extension model that enhances audio at 8-48 kHz by generating missing high-frequency content. Inputs are resampled to 48 kHz and processed by a neural vocoder backbone, enabling a single network to support…

Audio and Speech Processing · Electrical Eng. & Systems 2026-03-10 Yatharth Sharma

Built upon vector quantization (VQ), discrete audio codec models have achieved great success in audio compression and auto-regressive audio generation. However, existing models face substantial challenges in perceptual quality and signal…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-20 Zhikang Niu , Sanyuan Chen , Long Zhou , Ziyang Ma , Xie Chen , Shujie Liu

Diffusion speech enhancement on discrete audio codec features gain immense attention due to their improved speech component reconstruction capability. However, they usually suffer from high inference computational complexity due to multiple…

Audio and Speech Processing · Electrical Eng. & Systems 2026-01-30 Yihui Fu , Tim Fingscheidt

The majority of existing speech bandwidth extension (BWE) methods operate under the constraint of fixed source and target sampling rates, which limits their flexibility in practical applications. In this paper, we propose a multi-stage…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-05 Ye-Xin Lu , Yang Ai , Zheng-Yan Sheng , Zhen-Hua Ling

Bandwidth extension, the task of reconstructing the high-frequency components of an audio signal from its low-pass counterpart, is a long-standing problem in audio processing. While traditional approaches have evolved alongside the broader…

Sound · Computer Science 2025-11-27 Benoît Giniès , Xiaoyu Bie , Olivier Fercoq , Gaël Richard
‹ Prev 1 2 3 10 Next ›