English

Transformer-based Arabic Dialect Identification

Audio and Speech Processing 2020-11-03 v1

Abstract

This paper presents a dialect identification (DID) system based on the transformer neural network architecture. The conventional convolutional neural network (CNN)-based systems use the shorter receptive fields. We believe that long range information is equally important for language and DID, and self-attention mechanism in transformer captures the long range dependencies. In addition, to reduce the computational complexity, self-attention with downsampling is used to process the acoustic features. This process extracts sparse, yet informative features. Our experimental results show that transformer outperforms CNN-based networks on the Arabic dialect identification (ADI) dataset. We also report that the score-level fusion of CNN and transformer-based systems obtains an overall accuracy of 86.29% on the ADI17 database.

Keywords

Cite

@article{arxiv.2011.00699,
  title  = {Transformer-based Arabic Dialect Identification},
  author = {Wanqiu Lin and Maulik Madhavi and Rohan Kumar Das and Haizhou Li},
  journal= {arXiv preprint arXiv:2011.00699},
  year   = {2020}
}

Comments

Accepted for publication in International Conference on Asian Language Processing (IALP) 2020

R2 v1 2026-06-23T19:49:52.525Z