PTQ4ADM: Post-Training Quantization for Efficient Text Conditional Audio Diffusion Models

Jayneel Vora; Aditya Krishnan; Nader Bouacida; Prabhu RV Shankar; Prasant Mohapatra

PTQ4ADM: Post-Training Quantization for Efficient Text Conditional Audio Diffusion Models

Sound 2024-09-24 v1 Machine Learning Audio and Speech Processing

Authors: Jayneel Vora , Aditya Krishnan , Nader Bouacida , Prabhu RV Shankar , Prasant Mohapatra

Abstract

Denoising diffusion models have emerged as state-of-the-art in generative tasks across image, audio, and video domains, producing high-quality, diverse, and contextually relevant data. However, their broader adoption is limited by high computational costs and large memory footprints. Post-training quantization (PTQ) offers a promising approach to mitigate these challenges by reducing model complexity through low-bandwidth parameters. Yet, direct application of PTQ to diffusion models can degrade synthesis quality due to accumulated quantization noise across multiple denoising steps, particularly in conditional tasks like text-to-audio synthesis. This work introduces PTQ4ADM, a novel framework for quantizing audio diffusion models(ADMs). Our key contributions include (1) a coverage-driven prompt augmentation method and (2) an activation-aware calibration set generation algorithm for text-conditional ADMs. These techniques ensure comprehensive coverage of audio aspects and modalities while preserving synthesis fidelity. We validate our approach on TANGO, Make-An-Audio, and AudioLDM models for text-conditional audio generation. Extensive experiments demonstrate PTQ4ADM's capability to reduce the model size by up to 70\% while achieving synthesis quality metrics comparable to full-precision models( $<$ 5\% increase in FD scores). We show that specific layers in the backbone network can be quantized to 4-bit weights and 8-bit activations without significant quality loss. This work paves the way for more efficient deployment of ADMs in resource-constrained environments.

Keywords

quantization audio generation diffusion model

Cite

@article{arxiv.2409.13894,
  title  = {PTQ4ADM: Post-Training Quantization for Efficient Text Conditional Audio Diffusion Models},
  author = {Jayneel Vora and Aditya Krishnan and Nader Bouacida and Prabhu RV Shankar and Prasant Mohapatra},
  journal= {arXiv preprint arXiv:2409.13894},
  year   = {2024}
}

PTQ4ADM: Post-Training Quantization for Efficient Text Conditional Audio Diffusion Models

Abstract

Keywords

Cite

Related papers