English
Related papers

Related papers: Exploiting Dialect Identification in Automatic Dia…

200 papers

Arabic dialect identification is a specific task of natural language processing, aiming to automatically predict the Arabic dialect of a given text. Arabic dialect identification is the first step in various natural language processing…

Computation and Language · Computer Science 2020-09-29 Maha J. Althobaiti

Social media user-generated text is actually the main resource for many NLP tasks. This text however, does not follow the standard rules of writing. Moreover, the use of dialect such as Moroccan Arabic in written communications increases…

Computation and Language · Computer Science 2022-06-22 Randa Zarnoufi , Walid Bachri , Hamid Jaafar , Mounia Abik

This paper presents the design and development of multi-dialect automatic speech recognition for Arabic. Deep neural networks are becoming an effective tool to solve sequential data problems, particularly, adopting an end-to-end training of…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-30 Abbas Raza Ali

This paper presents a novel dotless representation of Arabic text as an alternative to the standard Arabic text representation. We delve into its implications through comprehensive analysis across five diverse corpora and four different…

Computation and Language · Computer Science 2023-12-27 Maged S. Al-Shaibani , Irfan Ahmad

Transcribed speech and user-generated text in Arabic typically contain a mixture of Modern Standard Arabic (MSA), the standardized language taught in schools, and Dialectal Arabic (DA), used in daily communications. To handle this…

Computation and Language · Computer Science 2023-10-24 Amr Keleg , Sharon Goldwater , Walid Magdy

Natural Language Processing (NLP) is today a very active field of research and innovation. Many applications need however big sets of data for supervised learning, suitably labelled for the training purpose. This includes applications for…

Computation and Language · Computer Science 2021-02-23 ElMehdi Boujou , Hamza Chataoui , Abdellah El Mekki , Saad Benjelloun , Ikram Chairi , Ismail Berrada

We investigate different approaches for dialect identification in Arabic broadcast speech, using phonetic, lexical features obtained from a speech recognition system, and acoustic features using the i-vector framework. We studied both…

Computation and Language · Computer Science 2016-08-12 Ahmed Ali , Najim Dehak , Patrick Cardinal , Sameer Khurana , Sree Harsha Yella , James Glass , Peter Bell , Steve Renals

The widespread absence of diacritical marks in Arabic text poses a significant challenge for Arabic natural language processing (NLP). This paper explores instances of naturally occurring diacritics, referred to as "diacritics in the wild,"…

Computation and Language · Computer Science 2024-06-11 Salman Elgamal , Ossama Obeid , Tameem Kabbani , Go Inoue , Nizar Habash

Natural Language Processing (NLP) is a vital computational method for addressing language processing, analysis, and generation. NLP tasks form the core of many daily applications, from automatic text correction to speech recognition. While…

Computation and Language · Computer Science 2024-10-18 Caroline Sabty

This paper presents a novel Dialectal Sound and Vowelization Recovery framework, designed to recognize borrowed and dialectal sounds within phonologically diverse and dialect-rich languages, that extends beyond its standard orthographic…

Audio and Speech Processing · Electrical Eng. & Systems 2024-08-06 Yassine El Kheir , Hamdy Mubarak , Ahmed Ali , Shammur Absar Chowdhury

Based on an annotated multimedia corpus, television series Mar{\=a}y{\=a} 2013, we dig into the question of ''automatic standardization'' of Arabic dialects for machine translation. Here we distinguish between rule-based machine translation…

Computation and Language · Computer Science 2023-01-10 Abidrabbo Alnassan

Arabic text diacritization remains a persistent challenge in natural language processing due to the language's morphological richness. In this paper, we introduce Sadeed, a novel approach based on a fine-tuned decoder-only language model…

Computation and Language · Computer Science 2025-08-22 Zeina Aldallal , Sara Chrouf , Khalil Hennara , Mohamed Motaism Hamed , Muhammad Hreden , Safwan AlModhayan

We observe a recent behaviour on social media, in which users intentionally remove consonantal dots from Arabic letters, in order to bypass content-classification algorithms. Content classification is typically done by fine-tuning…

Computation and Language · Computer Science 2021-11-19 Aviad Rom , Kfir Bar

Diacritization process attempt to restore the short vowels in Arabic written text; which typically are omitted. This process is essential for applications such as Text-to-Speech (TTS). While diacritization of Modern Standard Arabic (MSA)…

Computation and Language · Computer Science 2019-06-03 Ahmed Abdelali , Mohammed Attia , Younes Samih , Kareem Darwish , Hamdy Mubarak

Automatic Arabic Dialect Identification (ADI) of text has gained great popularity since it was introduced in the early 2010s. Multiple datasets were developed, and yearly shared tasks have been running since 2018. However, ADI systems are…

Computation and Language · Computer Science 2023-10-23 Amr Keleg , Walid Magdy

The first step in any NLP pipeline is to split the text into individual tokens. The most obvious and straightforward approach is to use words as tokens. However, given a large text corpus, representing all the words is not efficient in…

Computation and Language · Computer Science 2021-09-30 Zaid Alyafeai , Maged S. Al-shaibani , Mustafa Ghaleb , Irfan Ahmad

In many languages like Arabic, diacritics are used to specify pronunciations as well as meanings. Such diacritics are often omitted in written text, increasing the number of possible pronunciations and meanings for a word. This results in a…

Computation and Language · Computer Science 2020-06-09 Sawsan Alqahtani , Ajay Mishra , Mona Diab

Dialectal Arabic (DA) speech data vary widely in domain coverage, dialect labeling practices, and recording conditions, complicating cross-dataset comparison and model evaluation. To characterize this landscape, we conduct a computational…

Computation and Language · Computer Science 2026-01-30 Peter Sullivan , AbdelRahim Elmadany , Alcides Alcoba Inciarte , Muhammad Abdul-Mageed

Language in the Arab world presents a complex diglossic and multilingual setting, involving the use of Modern Standard Arabic, various dialects and sub-dialects, as well as multiple European languages. This diverse linguistic landscape has…

Computation and Language · Computer Science 2025-01-24 Injy Hamed , Caroline Sabty , Slim Abdennadher , Ngoc Thang Vu , Thamar Solorio , Nizar Habash

Diacritization of Arabic text is both an interesting and a challenging problem at the same time with various applications ranging from speech synthesis to helping students learning the Arabic language. Like many other tasks or problems in…

Computation and Language · Computer Science 2019-05-07 Ali Fadel , Ibraheem Tuffaha , Bara' Al-Jawarneh , Mahmoud Al-Ayyoub
‹ Prev 1 2 3 10 Next ›