English

Explanations for Automatic Speech Recognition

Sound 2023-03-01 v1 Artificial Intelligence Computation and Language Machine Learning Audio and Speech Processing

Abstract

We address quality assessment for neural network based ASR by providing explanations that help increase our understanding of the system and ultimately help build trust in the system. Compared to simple classification labels, explaining transcriptions is more challenging as judging their correctness is not straightforward and transcriptions as a variable-length sequence is not handled by existing interpretable machine learning models. We provide an explanation for an ASR transcription as a subset of audio frames that is both a minimal and sufficient cause of the transcription. To do this, we adapt existing explainable AI (XAI) techniques from image classification-Statistical Fault Localisation(SFL) and Causal. Additionally, we use an adapted version of Local Interpretable Model-Agnostic Explanations (LIME) for ASR as a baseline in our experiments. We evaluate the quality of the explanations generated by the proposed techniques over three different ASR ,Google API, the baseline model of Sphinx, Deepspeech and 100 audio samples from the Commonvoice dataset.

Keywords

Cite

@article{arxiv.2302.14062,
  title  = {Explanations for Automatic Speech Recognition},
  author = {Xiaoliang Wu and Peter Bell and Ajitha Rajan},
  journal= {arXiv preprint arXiv:2302.14062},
  year   = {2023}
}

Comments

Accepted by Speech Track, ICASSP 2023

R2 v1 2026-06-28T08:50:58.896Z