Related papers: Challenges in Kurdish Text Processing

Towards Machine Translation for the Kurdish Language

Machine translation is the task of translating texts from one language to another using computers. It has been one of the major tasks in natural language processing and computational linguistics and has been motivating to facilitate human…

Computation and Language · Computer Science 2020-10-14 Sina Ahmadi , Mariam Masoud

Developing a Fine-Grained Corpus for a Less-resourced Language: the case of Kurdish

Kurdish is a less-resourced language consisting of different dialects written in various scripts. Approximately 30 million people in different countries speak the language. The lack of corpora is one of the main obstacles in Kurdish…

Computation and Language · Computer Science 2019-09-26 Roshna Omer Abdulrahman , Hossein Hassani , Sina Ahmadi

Language and Speech Technology for Central Kurdish Varieties

Kurdish, an Indo-European language spoken by over 30 million speakers, is considered a dialect continuum and known for its diversity in language varieties. Previous studies addressing language and speech technology for Kurdish handle it in…

Computation and Language · Computer Science 2024-03-05 Sina Ahmadi , Daban Q. Jaff , Md Mahfuz Ibn Alam , Antonios Anastasopoulos

Approaches to Corpus Creation for Low-Resource Language Technology: the Case of Southern Kurdish and Laki

One of the major challenges that under-represented and endangered language communities face in language technology is the lack or paucity of language data. This is also the case of the Southern varieties of the Kurdish and Laki languages…

Computation and Language · Computer Science 2023-04-05 Sina Ahmadi , Zahra Azin , Sara Belelli , Antonios Anastasopoulos

Recent Advancements and Challenges of Turkic Central Asian Language Processing

Research in NLP for Central Asian Turkic languages - Kazakh, Uzbek, Kyrgyz, and Turkmen - faces typical low-resource language challenges like data scarcity, limited linguistic resources and technology development. However, recent…

Computation and Language · Computer Science 2026-02-17 Yana Veitsman , Mareike Hartmann

From Dialect Gaps to Identity Maps: Tackling Variability in Speaker Verification

The complexity and difficulties of Kurdish speaker detection among its several dialects are investigated in this work. Because of its great phonetic and lexical differences, Kurdish with several dialects including Kurmanji, Sorani, and…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-07 Abdulhady Abas Abdullah , Soran Badawi , Dana A. Abdullah , Dana Rasul Hamad

Transliterating Kurdish texts in Latin into Persian-Arabic script

Kurdish is written in different scripts. The two most popular scripts are Latin and Persian-Arabic. However, not all Kurdish readers are familiar with both mentioned scripts that could be resolved by automatic transliterators. So far, the…

Computation and Language · Computer Science 2021-10-26 Hossein Hassani

Central Kurdish machine translation: First large scale parallel corpus and experiments

While the computational processing of Kurdish has experienced a relative increase, the machine translation of this language seems to be lacking a considerable body of scientific work. This is in part due to the lack of resources especially…

Artificial Intelligence · Computer Science 2021-06-18 Zhila Amini , Mohammad Mohammadamini , Hawre Hosseini , Mehran Mansouri , Daban Jaff

KurdSTS: The Kurdish Semantic Textual Similarity

Semantic Textual Similarity (STS) measures the degree of meaning overlap between two texts and underpins many NLP tasks. While extensive resources exist for high-resource languages, low-resource languages such as Kurdish remain underserved.…

Computation and Language · Computer Science 2025-12-01 Abdulhady Abas Abdullah , Hadi Veisi , Hussein M. Al

Challenges Encountered in Turkish Natural Language Processing Studies

Natural language processing is a branch of computer science that combines artificial intelligence with linguistics. It aims to analyze a language element such as writing or speaking with software and convert it into information. Considering…

Computation and Language · Computer Science 2021-01-28 Kadir Tohma , Yakup Kutlu

Resources for Turkish Natural Language Processing: A critical survey

This paper presents a comprehensive survey of corpora and lexical resources available for Turkish. We review a broad range of resources, focusing on the ones that are publicly available. In addition to providing information about the…

Computation and Language · Computer Science 2023-02-28 Çağrı Çöltekin , A. Seza Doğruöz , Özlem Çetinoğlu

Speaker Diarization for Low-Resource Languages Through Wav2vec Fine-Tuning

Speaker diarization is a fundamental task in speech processing that involves dividing an audio stream by speaker. Although state-of-the-art models have advanced performance in high-resource languages, low-resource languages such as Kurdish…

Sound · Computer Science 2025-04-29 Abdulhady Abas Abdullah , Sarkhel H. Taher Karim , Sara Azad Ahmed , Kanar R. Tariq , Tarik A. Rashid

Towards Finite-State Morphology of Kurdish

Morphological analysis is the study of the formation and structure of words. It plays a crucial role in various tasks in Natural Language Processing (NLP) and Computational Linguistics (CL) such as machine translation and text and speech…

Computation and Language · Computer Science 2020-05-22 Sina Ahmadi , Hossein Hassani

Using Punkt for Sentence Segmentation in non-Latin Scripts: Experiments on Kurdish (Sorani) Texts

Segmentation is a fundamental step for most Natural Language Processing tasks. The Kurdish language is a multi-dialect, under-resourced language which is written in different scripts. The lack of various segmented corpora is one of the…

Computation and Language · Computer Science 2020-05-01 Roshna Omer Abdulrahman , Hossein Hassani

A Rule-based Kurdish Text Transliteration System

In this article, we present a rule-based approach for transliterating two mostly used orthographies in Sorani Kurdish. Our work consists of detecting a character in a word by removing the possible ambiguities and mapping it into the target…

Computation and Language · Computer Science 2018-11-27 Sina Ahmadi

From Statistical Methods to Pre-Trained Models; A Survey on Automatic Speech Recognition for Resource Scarce Urdu Language

Automatic Speech Recognition (ASR) technology has witnessed significant advancements in recent years, revolutionizing human-computer interactions. While major languages have benefited from these developments, lesser-resourced languages like…

Computation and Language · Computer Science 2024-11-25 Muhammad Sharif , Zeeshan Abbas , Jiangyan Yi , Chenglin Liu

The Challenges of Persian User-generated Textual Content: A Machine Learning-Based Approach

Over recent years a lot of research papers and studies have been published on the development of effective approaches that benefit from a large amount of user-generated content and build intelligent predictive models on top of them. This…

Computation and Language · Computer Science 2021-01-21 Mohammad Kasra Habib

Leveraging Multilingual News Websites for Building a Kurdish Parallel Corpus

Machine translation has been a major motivation of development in natural language processing. Despite the burgeoning achievements in creating more efficient machine translation systems thanks to deep learning methods, parallel corpora have…

Computation and Language · Computer Science 2020-10-06 Sina Ahmadi , Hossein Hassani , Daban Q. Jaff

Challenges of Computational Processing of Code-Switching

This paper addresses challenges of Natural Language Processing (NLP) on non-canonical multilingual data in which two or more languages are mixed. It refers to code-switching which has become more popular in our daily life and therefore…

Computation and Language · Computer Science 2016-10-10 Özlem Çetinoğlu , Sarah Schulz , Ngoc Thang Vu

Making Old Kurdish Publications Processable by Augmenting Available Optical Character Recognition Engines

Kurdish libraries have many historical publications that were printed back in the early days when printing devices were brought to Kurdistan. Having a good Optical Character Recognition (OCR) to help process these publications and…

Computation and Language · Computer Science 2024-04-10 Blnd Yaseen , Hossein Hassani