Vector Quantized Diffusion Model Based Speech Bandwidth Extension

Yuan Fang; Jinglin Bai; Jiajie Wang; Xueliang Zhang

Vector Quantized Diffusion Model Based Speech Bandwidth Extension

Sound 2024-09-17 v2 Audio and Speech Processing

Authors: Yuan Fang , Jinglin Bai , Jiajie Wang , Xueliang Zhang

Abstract

Recent advancements in neural audio codec (NAC) unlock new potential in audio signal processing. Studies have increasingly explored leveraging the latent features of NAC for various speech signal processing tasks. This paper introduces the first approach to speech bandwidth extension (BWE) that utilizes the discrete features obtained from NAC. By restoring high-frequency details within highly compressed discrete tokens, this approach enhances speech intelligibility and naturalness. Based on Vector Quantized Diffusion, the proposed framework combines the strengths of advanced NAC, diffusion models, and Mamba-2 to reconstruct high-frequency speech components. Extensive experiments demonstrate that this method exhibits superior performance across both log-spectral distance and ViSQOL, significantly improving speech quality.

Keywords

speech recognition

Cite

@article{arxiv.2409.05784,
  title  = {Vector Quantized Diffusion Model Based Speech Bandwidth Extension},
  author = {Yuan Fang and Jinglin Bai and Jiajie Wang and Xueliang Zhang},
  journal= {arXiv preprint arXiv:2409.05784},
  year   = {2024}
}

Comments

4pages

Vector Quantized Diffusion Model Based Speech Bandwidth Extension

Abstract

Keywords

Cite

Comments

Related papers