Example-Based Framework for Perceptually Guided Audio Texture Generation

Purnima Kamath; Chitralekha Gupta; Lonce Wyse; Suranga Nanayakkara

doi:10.1109/TASLP.2024.3393741

Example-Based Framework for Perceptually Guided Audio Texture Generation

Audio and Speech Processing 2024-10-08 v2 Artificial Intelligence Sound

Authors: Purnima Kamath , Chitralekha Gupta , Lonce Wyse , Suranga Nanayakkara

View on arXiv ↗ PDF ↗ DOI ↗

Abstract

Controllable generation using StyleGANs is usually achieved by training the model using labeled data. For audio textures, however, there is currently a lack of large semantically labeled datasets. Therefore, to control generation, we develop a method for semantic control over an unconditionally trained StyleGAN in the absence of such labeled datasets. In this paper, we propose an example-based framework to determine guidance vectors for audio texture generation based on user-defined semantic attributes. Our approach leverages the semantically disentangled latent space of an unconditionally trained StyleGAN. By using a few synthetic examples to indicate the presence or absence of a semantic attribute, we infer the guidance vectors in the latent space of the StyleGAN to control that attribute during generation. Our results show that our framework can find user-defined and perceptually relevant guidance vectors for controllable generation for audio textures. Furthermore, we demonstrate an application of our framework to other tasks, such as selective semantic attribute transfer.

Keywords

generative adversarial networks for speech speech processing audio signal processing

Cite

@article{arxiv.2308.11859,
  title  = {Example-Based Framework for Perceptually Guided Audio Texture Generation},
  author = {Purnima Kamath and Chitralekha Gupta and Lonce Wyse and Suranga Nanayakkara},
  journal= {arXiv preprint arXiv:2308.11859},
  year   = {2024}
}

Comments

Accepted for publication at IEEE Transactions on Audio, Speech and Language Processing

Example-Based Framework for Perceptually Guided Audio Texture Generation

Abstract

Keywords

Cite

Comments

Related papers