English

Dual Script E2E framework for Multilingual and Code-Switching ASR

Audio and Speech Processing 2021-06-04 v1 Machine Learning Sound

Abstract

India is home to multiple languages, and training automatic speech recognition (ASR) systems for languages is challenging. Over time, each language has adopted words from other languages, such as English, leading to code-mixing. Most Indian languages also have their own unique scripts, which poses a major limitation in training multilingual and code-switching ASR systems. Inspired by results in text-to-speech synthesis, in this work, we use an in-house rule-based phoneme-level common label set (CLS) representation to train multilingual and code-switching ASR for Indian languages. We propose two end-to-end (E2E) ASR systems. In the first system, the E2E model is trained on the CLS representation, and we use a novel data-driven back-end to recover the native language script. In the second system, we propose a modification to the E2E model, wherein the CLS representation and the native language characters are used simultaneously for training. We show our results on the multilingual and code-switching tasks of the Indic ASR Challenge 2021. Our best results achieve 6% and 5% improvement (approx) in word error rate over the baseline system for the multilingual and code-switching tasks, respectively, on the challenge development data.

Keywords

Cite

@article{arxiv.2106.01400,
  title  = {Dual Script E2E framework for Multilingual and Code-Switching ASR},
  author = {Mari Ganesh Kumar and Jom Kuriakose and Anand Thyagachandran and Arun Kumar A and Ashish Seth and Lodagala Durga Prasad and Saish Jaiswal and Anusha Prakash and Hema Murthy},
  journal= {arXiv preprint arXiv:2106.01400},
  year   = {2021}
}

Comments

Accepted for publication at Interspeech 2021

R2 v1 2026-06-24T02:46:03.923Z