English

Arabic Dialect Identification Using BERT-Based Domain Adaptation

Computation and Language 2020-11-16 v1 Machine Learning

Abstract

Arabic is one of the most important and growing languages in the world. With the rise of social media platforms such as Twitter, Arabic spoken dialects have become more in use. In this paper, we describe our approach on the NADI Shared Task 1 that requires us to build a system to differentiate between different 21 Arabic dialects, we introduce a deep learning semi-supervised fashion approach along with pre-processing that was reported on NADI shared Task 1 Corpus. Our system ranks 4th in NADI's shared task competition achieving a 23.09% F1 macro average score with a simple yet efficient approach to differentiating between 21 Arabic Dialects given tweets.

Keywords

Cite

@article{arxiv.2011.06977,
  title  = {Arabic Dialect Identification Using BERT-Based Domain Adaptation},
  author = {Ahmad Beltagy and Abdelrahman Wael and Omar ElSherief},
  journal= {arXiv preprint arXiv:2011.06977},
  year   = {2020}
}

Comments

6 pages, 2 figures , WANLP co-located with COLING 2020

R2 v1 2026-06-23T20:11:03.510Z