English
Related papers

Related papers: Arabic Dialect Identification Using BERT-Based Dom…

200 papers

Arabic dialect identification is a complex problem for a number of inherent properties of the language itself. In this paper, we present the experiments conducted, and the models developed by our competing team, Mawdoo3 AI, along the way to…

The Arabic language is among the most popular languages in the world with a huge variety of dialects spoken in 22 countries. In this study, we address the problem of classifying 18 Arabic dialects of the QADI dataset of Arabic tweets. RNN…

Computation and Language · Computer Science 2025-07-01 Omar A. Essameldin , Ali O. Elbeih , Wael H. Gomaa , Wael F. Elsersy

In this paper, we present our approach for the "Nuanced Arabic Dialect Identification (NADI) Shared Task 2023". We highlight our methodology for subtask 1 which deals with country-level dialect identification. Recognizing dialects plays an…

Computation and Language · Computer Science 2023-12-01 Vedant Deshpande , Yash Patwardhan , Kshitij Deshpande , Sudeep Mangalvedhekar , Ravindra Murumkar

This paper presents our approach to address the EACL WANLP-2021 Shared Task 1: Nuanced Arabic Dialect Identification (NADI). The task is aimed at developing a system that identifies the geographical location(country/province) from where an…

Computation and Language · Computer Science 2021-02-23 Anshul Wadhawan

In this paper, we tackle the Nuanced Arabic Dialect Identification (NADI) shared task (Abdul-Mageed et al., 2021) and demonstrate state-of-the-art results on all of its four subtasks. Tasks are to identify the geographic origin of short…

Computation and Language · Computer Science 2021-03-02 Badr AlKhamissi , Mohamed Gabr , Muhammad ElNokrashy , Khaled Essam

We present QADI, an automatically collected dataset of tweets belonging to a wide range of country-level Arabic dialects -covering 18 different countries in the Middle East and North Africa region. Our method for building this dataset…

Computation and Language · Computer Science 2020-05-18 Ahmed Abdelali , Hamdy Mubarak , Younes Samih , Sabit Hassan , Kareem Darwish

We present the results and findings of the First Nuanced Arabic Dialect Identification Shared Task (NADI). This Shared Task includes two subtasks: country-level dialect identification (Subtask 1) and province-level sub-dialect…

Computation and Language · Computer Science 2020-11-11 Muhammad Abdul-Mageed , Chiyu Zhang , Houda Bouamor , Nizar Habash

We present the findings of the sixth Nuanced Arabic Dialect Identification (NADI 2025) Shared Task, which focused on Arabic speech dialect processing across three subtasks: spoken dialect identification (Subtask 1), speech recognition…

Dialect and standard language identification are crucial tasks for many Arabic natural language processing applications. In this paper, we present our deep learning-based system, submitted to the second NADI shared task for country-level…

Computation and Language · Computer Science 2021-06-24 Abdellah El Mekki , Abdelkader El Mahdaouy , Kabil Essefar , Nabil El Mamoun , Ismail Berrada , Ahmed Khoumsi

Speech acts are a speakers actions when performing an utterance within a conversation, such as asking, recommending, greeting, or thanking someone, expressing a thought, or making a suggestion. Understanding speech acts helps interpret the…

Computation and Language · Computer Science 2024-02-01 Khadejaa Alshehri , Areej Alhothali , Nahed Alowidi

In this paper, we present Arap-Tweet, which is a large-scale and multi-dialectal corpus of Tweets from 11 regions and 16 countries in the Arab world representing the major Arabic dialectal varieties. To build this corpus, we collected data…

Computation and Language · Computer Science 2018-08-24 Wajdi Zaghouani , Anis Charfi

We describe findings of the third Nuanced Arabic Dialect Identification Shared Task (NADI 2022). NADI aims at advancing state of the art Arabic NLP, including on Arabic dialects. It does so by affording diverse datasets and modeling…

Computation and Language · Computer Science 2022-10-24 Muhammad Abdul-Mageed , Chiyu Zhang , AbdelRahim Elmadany , Houda Bouamor , Nizar Habash

Prediction of language varieties and dialects is an important language processing task, with a wide range of applications. For Arabic, the native tongue of ~ 300 million people, most varieties remain unsupported. To ease this bottleneck, we…

Computation and Language · Computer Science 2019-11-01 Muhammad Abdul-Mageed , Chiyu Zhang , AbdelRahim Elmadany , Arun Rajendran , Lyle Ungar

Natural Language Processing (NLP) is today a very active field of research and innovation. Many applications need however big sets of data for supervised learning, suitably labelled for the training purpose. This includes applications for…

Computation and Language · Computer Science 2021-02-23 ElMehdi Boujou , Hamza Chataoui , Abdellah El Mekki , Saad Benjelloun , Ikram Chairi , Ismail Berrada

We report our models for detecting age, language variety, and gender from social media data in the context of the Arabic author profiling and deception detection shared task (APDA). We build simple models based on pre-trained bidirectional…

Computation and Language · Computer Science 2019-11-01 Chiyu Zhang , Muhammad Abdul-Mageed

Arabic dialect identification is a specific task of natural language processing, aiming to automatically predict the Arabic dialect of a given text. Arabic dialect identification is the first step in various natural language processing…

Computation and Language · Computer Science 2020-09-29 Maha J. Althobaiti

We describe the findings of the fourth Nuanced Arabic Dialect Identification Shared Task (NADI 2023). The objective of NADI is to help advance state-of-the-art Arabic NLP by creating opportunities for teams of researchers to collaboratively…

Computation and Language · Computer Science 2023-10-26 Muhammad Abdul-Mageed , AbdelRahim Elmadany , Chiyu Zhang , El Moatez Billah Nagoudi , Houda Bouamor , Nizar Habash

We describe the findings of the fifth Nuanced Arabic Dialect Identification Shared Task (NADI 2024). NADI's objective is to help advance SoTA Arabic NLP by providing guidance, datasets, modeling opportunities, and standardized evaluation…

Computation and Language · Computer Science 2024-07-09 Muhammad Abdul-Mageed , Amr Keleg , AbdelRahim Elmadany , Chiyu Zhang , Injy Hamed , Walid Magdy , Houda Bouamor , Nizar Habash

Transcribed speech and user-generated text in Arabic typically contain a mixture of Modern Standard Arabic (MSA), the standardized language taught in schools, and Dialectal Arabic (DA), used in daily communications. To handle this…

Computation and Language · Computer Science 2023-10-24 Amr Keleg , Sharon Goldwater , Walid Magdy

Pretraining Bidirectional Encoder Representations from Transformers (BERT) for downstream NLP tasks is a non-trival task. We pretrained 5 BERT models that differ in the size of their training sets, mixture of formal and informal Arabic, and…

Computation and Language · Computer Science 2021-02-23 Ahmed Abdelali , Sabit Hassan , Hamdy Mubarak , Kareem Darwish , Younes Samih
‹ Prev 1 2 3 10 Next ›