Related papers: Arabic Dialect Identification Using BERT-Based Dom…

Multi-Dialect Arabic BERT for Country-Level Dialect Identification

Arabic dialect identification is a complex problem for a number of inherent properties of the language itself. In this paper, we present the experiments conducted, and the models developed by our competing team, Mawdoo3 AI, along the way to…

Computation and Language · Computer Science 2020-07-14 Bashar Talafha , Mohammad Ali , Muhy Eddin Za'ter , Haitham Seelawi , Ibraheem Tuffaha , Mostafa Samir , Wael Farhan , Hussein T. Al-Natsheh

Arabic Dialect Classification using RNNs, Transformers, and Large Language Models: A Comparative Analysis

The Arabic language is among the most popular languages in the world with a huge variety of dialects spoken in 22 countries. In this study, we address the problem of classifying 18 Arabic dialects of the QADI dataset of Arabic tweets. RNN…

Computation and Language · Computer Science 2025-07-01 Omar A. Essameldin , Ali O. Elbeih , Wael H. Gomaa , Wael F. Elsersy

Mavericks at NADI 2023 Shared Task: Unravelling Regional Nuances through Dialect Identification using Transformer-based Approach

In this paper, we present our approach for the "Nuanced Arabic Dialect Identification (NADI) Shared Task 2023". We highlight our methodology for subtask 1 which deals with country-level dialect identification. Recognizing dialects plays an…

Computation and Language · Computer Science 2023-12-01 Vedant Deshpande , Yash Patwardhan , Kshitij Deshpande , Sudeep Mangalvedhekar , Ravindra Murumkar

Dialect Identification in Nuanced Arabic Tweets Using Farasa Segmentation and AraBERT

This paper presents our approach to address the EACL WANLP-2021 Shared Task 1: Nuanced Arabic Dialect Identification (NADI). The task is aimed at developing a system that identifies the geographical location(country/province) from where an…

Computation and Language · Computer Science 2021-02-23 Anshul Wadhawan

Adapting MARBERT for Improved Arabic Dialect Identification: Submission to the NADI 2021 Shared Task

In this paper, we tackle the Nuanced Arabic Dialect Identification (NADI) shared task (Abdul-Mageed et al., 2021) and demonstrate state-of-the-art results on all of its four subtasks. Tasks are to identify the geographic origin of short…

Computation and Language · Computer Science 2021-03-02 Badr AlKhamissi , Mohamed Gabr , Muhammad ElNokrashy , Khaled Essam

Arabic Dialect Identification in the Wild

We present QADI, an automatically collected dataset of tweets belonging to a wide range of country-level Arabic dialects -covering 18 different countries in the Middle East and North Africa region. Our method for building this dataset…

Computation and Language · Computer Science 2020-05-18 Ahmed Abdelali , Hamdy Mubarak , Younes Samih , Sabit Hassan , Kareem Darwish

NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task

We present the results and findings of the First Nuanced Arabic Dialect Identification Shared Task (NADI). This Shared Task includes two subtasks: country-level dialect identification (Subtask 1) and province-level sub-dialect…

Computation and Language · Computer Science 2020-11-11 Muhammad Abdul-Mageed , Chiyu Zhang , Houda Bouamor , Nizar Habash

NADI 2025: The First Multidialectal Arabic Speech Processing Shared Task

We present the findings of the sixth Nuanced Arabic Dialect Identification (NADI 2025) Shared Task, which focused on Arabic speech dialect processing across three subtasks: spoken dialect identification (Subtask 1), speech recognition…

Computation and Language · Computer Science 2025-09-05 Bashar Talafha , Hawau Olamide Toyin , Peter Sullivan , AbdelRahim Elmadany , Abdurrahman Juma , Amirbek Djanibekov , Chiyu Zhang , Hamad Alshehhi , Hanan Aldarmaki , Mustafa Jarrar , Nizar Habash , Muhammad Abdul-Mageed

BERT-based Multi-Task Model for Country and Province Level Modern Standard Arabic and Dialectal Arabic Identification

Dialect and standard language identification are crucial tasks for many Arabic natural language processing applications. In this paper, we present our deep learning-based system, submitted to the second NADI shared task for country-level…

Computation and Language · Computer Science 2021-06-24 Abdellah El Mekki , Abdelkader El Mahdaouy , Kabil Essefar , Nabil El Mamoun , Ismail Berrada , Ahmed Khoumsi

Arabic Tweet Act: A Weighted Ensemble Pre-Trained Transformer Model for Classifying Arabic Speech Acts on Twitter

Speech acts are a speakers actions when performing an utterance within a conversation, such as asking, recommending, greeting, or thanking someone, expressing a thought, or making a suggestion. Understanding speech acts helps interpret the…

Computation and Language · Computer Science 2024-02-01 Khadejaa Alshehri , Areej Alhothali , Nahed Alowidi

Arap-Tweet: A Large Multi-Dialect Twitter Corpus for Gender, Age and Language Variety Identification

In this paper, we present Arap-Tweet, which is a large-scale and multi-dialectal corpus of Tweets from 11 regions and 16 countries in the Arab world representing the major Arabic dialectal varieties. To build this corpus, we collected data…

Computation and Language · Computer Science 2018-08-24 Wajdi Zaghouani , Anis Charfi

NADI 2022: The Third Nuanced Arabic Dialect Identification Shared Task

We describe findings of the third Nuanced Arabic Dialect Identification Shared Task (NADI 2022). NADI aims at advancing state of the art Arabic NLP, including on Arabic dialects. It does so by affording diverse datasets and modeling…

Computation and Language · Computer Science 2022-10-24 Muhammad Abdul-Mageed , Chiyu Zhang , AbdelRahim Elmadany , Houda Bouamor , Nizar Habash

DiaNet: BERT and Hierarchical Attention Multi-Task Learning of Fine-Grained Dialect

Prediction of language varieties and dialects is an important language processing task, with a wide range of applications. For Arabic, the native tongue of ~ 300 million people, most varieties remain unsupported. To ease this bottleneck, we…

Computation and Language · Computer Science 2019-11-01 Muhammad Abdul-Mageed , Chiyu Zhang , AbdelRahim Elmadany , Arun Rajendran , Lyle Ungar

An open access NLP dataset for Arabic dialects : Data collection, labeling, and model construction

Natural Language Processing (NLP) is today a very active field of research and innovation. Many applications need however big sets of data for supervised learning, suitably labelled for the training purpose. This includes applications for…

Computation and Language · Computer Science 2021-02-23 ElMehdi Boujou , Hamza Chataoui , Abdellah El Mekki , Saad Benjelloun , Ikram Chairi , Ismail Berrada

BERT-Based Arabic Social Media Author Profiling

We report our models for detecting age, language variety, and gender from social media data in the context of the Arabic author profiling and deception detection shared task (APDA). We build simple models based on pre-trained bidirectional…

Computation and Language · Computer Science 2019-11-01 Chiyu Zhang , Muhammad Abdul-Mageed

Automatic Arabic Dialect Identification Systems for Written Texts: A Survey

Arabic dialect identification is a specific task of natural language processing, aiming to automatically predict the Arabic dialect of a given text. Arabic dialect identification is the first step in various natural language processing…

Computation and Language · Computer Science 2020-09-29 Maha J. Althobaiti

NADI 2023: The Fourth Nuanced Arabic Dialect Identification Shared Task

We describe the findings of the fourth Nuanced Arabic Dialect Identification Shared Task (NADI 2023). The objective of NADI is to help advance state-of-the-art Arabic NLP by creating opportunities for teams of researchers to collaboratively…

Computation and Language · Computer Science 2023-10-26 Muhammad Abdul-Mageed , AbdelRahim Elmadany , Chiyu Zhang , El Moatez Billah Nagoudi , Houda Bouamor , Nizar Habash

NADI 2024: The Fifth Nuanced Arabic Dialect Identification Shared Task

We describe the findings of the fifth Nuanced Arabic Dialect Identification Shared Task (NADI 2024). NADI's objective is to help advance SoTA Arabic NLP by providing guidance, datasets, modeling opportunities, and standardized evaluation…

Computation and Language · Computer Science 2024-07-09 Muhammad Abdul-Mageed , Amr Keleg , AbdelRahim Elmadany , Chiyu Zhang , Injy Hamed , Walid Magdy , Houda Bouamor , Nizar Habash

ALDi: Quantifying the Arabic Level of Dialectness of Text

Transcribed speech and user-generated text in Arabic typically contain a mixture of Modern Standard Arabic (MSA), the standardized language taught in schools, and Dialectal Arabic (DA), used in daily communications. To handle this…

Computation and Language · Computer Science 2023-10-24 Amr Keleg , Sharon Goldwater , Walid Magdy

Pre-Training BERT on Arabic Tweets: Practical Considerations

Pretraining Bidirectional Encoder Representations from Transformers (BERT) for downstream NLP tasks is a non-trival task. We pretrained 5 BERT models that differ in the size of their training sets, mixture of formal and informal Arabic, and…

Computation and Language · Computer Science 2021-02-23 Ahmed Abdelali , Sabit Hassan , Hamdy Mubarak , Kareem Darwish , Younes Samih