English
Related papers

Related papers: Challenges in Kurdish Text Processing

200 papers

Machine translation is the task of translating texts from one language to another using computers. It has been one of the major tasks in natural language processing and computational linguistics and has been motivating to facilitate human…

Computation and Language · Computer Science 2020-10-14 Sina Ahmadi , Mariam Masoud

Kurdish is a less-resourced language consisting of different dialects written in various scripts. Approximately 30 million people in different countries speak the language. The lack of corpora is one of the main obstacles in Kurdish…

Computation and Language · Computer Science 2019-09-26 Roshna Omer Abdulrahman , Hossein Hassani , Sina Ahmadi

Kurdish, an Indo-European language spoken by over 30 million speakers, is considered a dialect continuum and known for its diversity in language varieties. Previous studies addressing language and speech technology for Kurdish handle it in…

Computation and Language · Computer Science 2024-03-05 Sina Ahmadi , Daban Q. Jaff , Md Mahfuz Ibn Alam , Antonios Anastasopoulos

One of the major challenges that under-represented and endangered language communities face in language technology is the lack or paucity of language data. This is also the case of the Southern varieties of the Kurdish and Laki languages…

Computation and Language · Computer Science 2023-04-05 Sina Ahmadi , Zahra Azin , Sara Belelli , Antonios Anastasopoulos

Research in NLP for Central Asian Turkic languages - Kazakh, Uzbek, Kyrgyz, and Turkmen - faces typical low-resource language challenges like data scarcity, limited linguistic resources and technology development. However, recent…

Computation and Language · Computer Science 2026-02-17 Yana Veitsman , Mareike Hartmann

The complexity and difficulties of Kurdish speaker detection among its several dialects are investigated in this work. Because of its great phonetic and lexical differences, Kurdish with several dialects including Kurmanji, Sorani, and…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-07 Abdulhady Abas Abdullah , Soran Badawi , Dana A. Abdullah , Dana Rasul Hamad

Kurdish is written in different scripts. The two most popular scripts are Latin and Persian-Arabic. However, not all Kurdish readers are familiar with both mentioned scripts that could be resolved by automatic transliterators. So far, the…

Computation and Language · Computer Science 2021-10-26 Hossein Hassani

While the computational processing of Kurdish has experienced a relative increase, the machine translation of this language seems to be lacking a considerable body of scientific work. This is in part due to the lack of resources especially…

Artificial Intelligence · Computer Science 2021-06-18 Zhila Amini , Mohammad Mohammadamini , Hawre Hosseini , Mehran Mansouri , Daban Jaff

Semantic Textual Similarity (STS) measures the degree of meaning overlap between two texts and underpins many NLP tasks. While extensive resources exist for high-resource languages, low-resource languages such as Kurdish remain underserved.…

Computation and Language · Computer Science 2025-12-01 Abdulhady Abas Abdullah , Hadi Veisi , Hussein M. Al

Natural language processing is a branch of computer science that combines artificial intelligence with linguistics. It aims to analyze a language element such as writing or speaking with software and convert it into information. Considering…

Computation and Language · Computer Science 2021-01-28 Kadir Tohma , Yakup Kutlu

This paper presents a comprehensive survey of corpora and lexical resources available for Turkish. We review a broad range of resources, focusing on the ones that are publicly available. In addition to providing information about the…

Computation and Language · Computer Science 2023-02-28 Çağrı Çöltekin , A. Seza Doğruöz , Özlem Çetinoğlu

Speaker diarization is a fundamental task in speech processing that involves dividing an audio stream by speaker. Although state-of-the-art models have advanced performance in high-resource languages, low-resource languages such as Kurdish…

Morphological analysis is the study of the formation and structure of words. It plays a crucial role in various tasks in Natural Language Processing (NLP) and Computational Linguistics (CL) such as machine translation and text and speech…

Computation and Language · Computer Science 2020-05-22 Sina Ahmadi , Hossein Hassani

Segmentation is a fundamental step for most Natural Language Processing tasks. The Kurdish language is a multi-dialect, under-resourced language which is written in different scripts. The lack of various segmented corpora is one of the…

Computation and Language · Computer Science 2020-05-01 Roshna Omer Abdulrahman , Hossein Hassani

In this article, we present a rule-based approach for transliterating two mostly used orthographies in Sorani Kurdish. Our work consists of detecting a character in a word by removing the possible ambiguities and mapping it into the target…

Computation and Language · Computer Science 2018-11-27 Sina Ahmadi

Automatic Speech Recognition (ASR) technology has witnessed significant advancements in recent years, revolutionizing human-computer interactions. While major languages have benefited from these developments, lesser-resourced languages like…

Computation and Language · Computer Science 2024-11-25 Muhammad Sharif , Zeeshan Abbas , Jiangyan Yi , Chenglin Liu

Over recent years a lot of research papers and studies have been published on the development of effective approaches that benefit from a large amount of user-generated content and build intelligent predictive models on top of them. This…

Computation and Language · Computer Science 2021-01-21 Mohammad Kasra Habib

Machine translation has been a major motivation of development in natural language processing. Despite the burgeoning achievements in creating more efficient machine translation systems thanks to deep learning methods, parallel corpora have…

Computation and Language · Computer Science 2020-10-06 Sina Ahmadi , Hossein Hassani , Daban Q. Jaff

This paper addresses challenges of Natural Language Processing (NLP) on non-canonical multilingual data in which two or more languages are mixed. It refers to code-switching which has become more popular in our daily life and therefore…

Computation and Language · Computer Science 2016-10-10 Özlem Çetinoğlu , Sarah Schulz , Ngoc Thang Vu

Kurdish libraries have many historical publications that were printed back in the early days when printing devices were brought to Kurdistan. Having a good Optical Character Recognition (OCR) to help process these publications and…

Computation and Language · Computer Science 2024-04-10 Blnd Yaseen , Hossein Hassani
‹ Prev 1 2 3 10 Next ›