English

Efficient Weight factorization for Multilingual Speech Recognition

Computation and Language 2021-05-10 v1 Sound Audio and Speech Processing

Abstract

End-to-end multilingual speech recognition involves using a single model training on a compositional speech corpus including many languages, resulting in a single neural network to handle transcribing different languages. Due to the fact that each language in the training data has different characteristics, the shared network may struggle to optimize for all various languages simultaneously. In this paper we propose a novel multilingual architecture that targets the core operation in neural networks: linear transformation functions. The key idea of the method is to assign fast weight matrices for each language by decomposing each weight matrix into a shared component and a language dependent component. The latter is then factorized into vectors using rank-1 assumptions to reduce the number of parameters per language. This efficient factorization scheme is proved to be effective in two multilingual settings with 77 and 2727 languages, reducing the word error rates by 26%26\% and 27%27\% rel. for two popular architectures LSTM and Transformer, respectively.

Keywords

Cite

@article{arxiv.2105.03010,
  title  = {Efficient Weight factorization for Multilingual Speech Recognition},
  author = {Ngoc-Quan Pham and Tuan-Nam Nguyen and Sebastian Stueker and Alexander Waibel},
  journal= {arXiv preprint arXiv:2105.03010},
  year   = {2021}
}

Comments

Submitted to Interspeech 2021

R2 v1 2026-06-24T01:51:43.168Z