Topological Data Analysis for Speech Processing

Eduard Tulchinskii; Kristian Kuznetsov; Laida Kushnareva; Daniil Cherniavskii; Serguei Barannikov; Irina Piontkovskaya; Sergey Nikolenko; Evgeny Burnaev

doi:10.21437/Interspeech.2023-1861

Topological Data Analysis for Speech Processing

Sound 2023-09-12 v3 Computation and Language Machine Learning Audio and Speech Processing Algebraic Topology

Authors: Eduard Tulchinskii , Kristian Kuznetsov , Laida Kushnareva , Daniil Cherniavskii , Serguei Barannikov , Irina Piontkovskaya , Sergey Nikolenko , Evgeny Burnaev

View on arXiv ↗ PDF ↗ DOI ↗

Abstract

We apply topological data analysis (TDA) to speech classification problems and to the introspection of a pretrained speech model, HuBERT. To this end, we introduce a number of topological and algebraic features derived from Transformer attention maps and embeddings. We show that a simple linear classifier built on top of such features outperforms a fine-tuned classification head. In particular, we achieve an improvement of about $9\%$ accuracy and $5\%$ ERR on four common datasets; on CREMA-D, the proposed feature set reaches a new state of the art performance with accuracy $80.155$ . We also show that topological features are able to reveal functional roles of speech Transformer heads; e.g., we find the heads capable to distinguish between pairs of sample sources (natural/synthetic) or voices without any downstream fine-tuning. Our results demonstrate that TDA is a promising new approach for speech analysis, especially for tasks that require structural prediction. Appendices, an introduction to TDA, and other additional materials are available here - https://topohubert.github.io/speech-topology-webpages/

Keywords

topological data analysis speech recognition sequence alignment

Cite

@article{arxiv.2211.17223,
  title  = {Topological Data Analysis for Speech Processing},
  author = {Eduard Tulchinskii and Kristian Kuznetsov and Laida Kushnareva and Daniil Cherniavskii and Serguei Barannikov and Irina Piontkovskaya and Sergey Nikolenko and Evgeny Burnaev},
  journal= {arXiv preprint arXiv:2211.17223},
  year   = {2023}
}

Comments

Accepted to INTERSPEECH 2023 conference

Topological Data Analysis for Speech Processing

Abstract

Keywords

Cite

Comments

Related papers