Efficient And Scalable Neural Residual Waveform Coding With Collaborative Quantization
Abstract
Scalability and efficiency are desired in neural speech codecs, which supports a wide range of bitrates for applications on various devices. We propose a collaborative quantization (CQ) scheme to jointly learn the codebook of LPC coefficients and the corresponding residuals. CQ does not simply shoehorn LPC to a neural network, but bridges the computational capacity of advanced neural network models and traditional, yet efficient and domain-specific digital signal processing methods in an integrated manner. We demonstrate that CQ achieves much higher quality than its predecessor at 9 kbps with even lower model complexity. We also show that CQ can scale up to 24 kbps where it outperforms AMR-WB and Opus. As a neural waveform codec, CQ models are with less than 1 million parameters, significantly less than many other generative models.
Cite
@article{arxiv.2002.05604,
title = {Efficient And Scalable Neural Residual Waveform Coding With Collaborative Quantization},
author = {Kai Zhen and Mi Suk Lee and Jongmo Sung and Seungkwon Beack and Minje Kim},
journal= {arXiv preprint arXiv:2002.05604},
year = {2020}
}
Comments
Accepted in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) , Barcelona, Spain, May 4-8, 2020