Comparing supervised and self-supervised embedding for ExVo Multi-Task learning track

Tilak Purohit; Imen Ben Mahmoud; Bogdan Vlasenko; Mathew Magimai. -Doss

Comparing supervised and self-supervised embedding for ExVo Multi-Task learning track

Sound 2022-07-26 v1 Audio and Speech Processing

Authors: Tilak Purohit , Imen Ben Mahmoud , Bogdan Vlasenko , Mathew Magimai. -Doss

Abstract

The ICML Expressive Vocalizations (ExVo) Multi-task challenge 2022, focuses on understanding the emotional facets of the non-linguistic vocalizations (vocal bursts (VB)). The objective of this challenge is to predict emotional intensities for VB, being a multi-task challenge it also requires to predict speakers' age and native-country. For this challenge we study and compare two distinct embedding spaces namely, self-supervised learning (SSL) based embeddings and task-specific supervised learning based embeddings. Towards that, we investigate feature representations obtained from several pre-trained SSL neural networks and task-specific supervised classification neural networks. Our studies show that the best performance is obtained with a hybrid approach, where predictions derived via both SSL and task-specific supervised learning are used. Our best system on test-set surpasses the ComPARE baseline (harmonic mean of all sub-task scores i.e., $S_{MTL}$ ) by a relative $13\%$ margin.

Keywords

self-supervised learning speaker verification support vector machine

Cite

@article{arxiv.2206.11968,
  title  = {Comparing supervised and self-supervised embedding for ExVo Multi-Task learning track},
  author = {Tilak Purohit and Imen Ben Mahmoud and Bogdan Vlasenko and Mathew Magimai. -Doss},
  journal= {arXiv preprint arXiv:2206.11968},
  year   = {2022}
}

Comparing supervised and self-supervised embedding for ExVo Multi-Task learning track

Abstract

Keywords

Cite

Related papers