Related papers: Morphological Disambiguation from Stemming Data

A Morphology-aware Network for Morphological Disambiguation

Agglutinative languages such as Turkish, Finnish and Hungarian require morphological disambiguation before further processing due to the complex morphology of words. A morphological disambiguator is used to select the correct morphological…

Computation and Language · Computer Science 2017-02-14 Eray Yildiz , Caglar Tirkaz , H. Bahadir Sahin , Mustafa Tolga Eren , Ozan Sonmez

Using Multiple Sources of Information for Constraint-Based Morphological Disambiguation

This thesis presents a constraint-based morphological disambiguation approach that is applicable to languages with complex morphology--specifically agglutinative languages with productive inflectional and derivational morphological…

cmp-lg · Computer Science 2008-02-03 Gokhan Tur

KinyaBERT: a Morphology-aware Kinyarwanda Language Model

Pre-trained language models such as BERT have been successful at tackling many natural language processing tasks. However, the unsupervised sub-word tokenization methods commonly used in these models (e.g., byte-pair encoding - BPE) are…

Computation and Language · Computer Science 2023-04-26 Antoine Nzeyimana , Andre Niyongabo Rubungo

Neural disambiguation of lemma and part of speech in morphologically rich languages

We consider the problem of disambiguating the lemma and part of speech of ambiguous words in morphologically rich languages. We propose a method for disambiguating ambiguous words in context, using a large un-annotated corpus of text, and a…

Computation and Language · Computer Science 2020-07-14 José María Hoya Quecedo , Maximilian W. Koppatz , Giacomo Furlan , Roman Yangarber

Morphological Disambiguation by Voting Constraints

We present a constraint-based morphological disambiguation system in which individual constraints vote on matching morphological parses, and disambiguation of all the tokens in a sentence is performed at the end by selecting parses that…

cmp-lg · Computer Science 2016-08-31 Kemal Oflazer , Gokhan Tur

Morphological Processing of Low-Resource Languages: Where We Are and What's Next

Automatic morphological processing can aid downstream natural language processing applications, especially for low-resource languages, and assist language documentation efforts for endangered languages. Having long been multilingual, the…

Computation and Language · Computer Science 2022-03-18 Adam Wiemerslage , Miikka Silfverberg , Changbing Yang , Arya D. McCarthy , Garrett Nicolai , Eliana Colunga , Katharina Kann

Combining Hand-crafted Rules and Unsupervised Learning in Constraint-based Morphological Disambiguation

This paper presents a constraint-based morphological disambiguation approach that is applicable languages with complex morphology--specifically agglutinative languages with productive inflectional and derivational morphological phenomena.…

cmp-lg · Computer Science 2008-02-03 Kemal Oflazer , Gokhan Tur

Stemming -- The Evolution and Current State with a Focus on Bangla

Bangla, the seventh most widely spoken language worldwide with 300 million native speakers, faces digital under-representation due to limited resources and lack of annotated datasets. Stemming, a critical preprocessing step in language…

Computation and Language · Computer Science 2025-08-22 Abhijit Paul , Mashiat Amin Farin , Sharif Md. Abdullah , Ahmedul Kabir , Zarif Masud , Shebuti Rayana

Morphological Analysis of Japanese Hiragana Sentences using the BI-LSTM CRF Model

This study proposes a method to develop neural models of the morphological analyzer for Japanese Hiragana sentences using the Bi-LSTM CRF model. Morphological analysis is a technique that divides text data into words and assigns information…

Computation and Language · Computer Science 2022-01-11 Jun Izutsu , Kanako Komiya

Bootstrapping Techniques for Polysynthetic Morphological Analysis

Polysynthetic languages have exceptionally large and sparse vocabularies, thanks to the number of morpheme slots and combinations in a word. This complexity, together with a general scarcity of written data, poses a challenge to the…

Computation and Language · Computer Science 2020-05-05 William Lane , Steven Bird

A Straightforward Approach to Morphological Analysis and Synthesis

In this paper we present a lexicon-based approach to the problem of morphological processing. Full-form words, lemmas and grammatical tags are interconnected in a DAWG. Thus, the process of analysis/synthesis is reduced to a search in the…

Computation and Language · Computer Science 2007-05-23 Kyriakos N. Sgarbas , Nikos D. Fakotakis , George K. Kokkinakis

Understanding Compositional Data Augmentation in Typologically Diverse Morphological Inflection

Data augmentation techniques are widely used in low-resource automatic morphological inflection to overcome data sparsity. However, the full implications of these techniques remain poorly understood. In this study, we aim to shed light on…

Computation and Language · Computer Science 2023-10-25 Farhan Samir , Miikka Silfverberg

TAMS: Translation-Assisted Morphological Segmentation

Canonical morphological segmentation is the process of analyzing words into the standard (aka underlying) forms of their constituent morphemes. This is a core task in language documentation, and NLP systems have the potential to…

Computation and Language · Computer Science 2024-10-16 Enora Rice , Ali Marashian , Luke Gessler , Alexis Palmer , Katharina von der Wense

Translating from Morphologically Complex Languages: A Paraphrase-Based Approach

We propose a novel approach to translating from a morphologically complex language. Unlike previous research, which has targeted word inflections and concatenations, we focus on the pairwise relationship between morphologically related…

Computation and Language · Computer Science 2021-09-29 Preslav Nakov , Hwee Tou Ng

Tagging French Without Lexical Probabilities -- Combining Linguistic Knowledge And Statistical Learning

This paper explores morpho-syntactic ambiguities for French to develop a strategy for part-of-speech disambiguation that a) reflects the complexity of French as an inflected language, b) optimizes the estimation of probabilities, c) allows…

cmp-lg · Computer Science 2007-05-23 Evelyne Tzoukermann , Dragomir R. Radev , William A. Gale

Improving Named Entity Recognition by Jointly Learning to Disambiguate Morphological Tags

Previous studies have shown that linguistic features of a word such as possession, genitive or other grammatical cases can be employed in word representations of a named entity recognition (NER) tagger to improve the performance for…

Computation and Language · Computer Science 2019-11-12 Onur Güngör , Suzan Üsküdarlı , Tunga Güngör

Morphological Analysis as Classification: an Inductive-Learning Approach

Morphological analysis is an important subtask in text-to-speech conversion, hyphenation, and other language engineering tasks. The traditional approach to performing morphological analysis is to combine a morpheme lexicon, sets of…

cmp-lg · Computer Science 2008-02-03 Antal van den Bosch , Walter Daelemans , Ton Weijters

Morph-fitting: Fine-Tuning Word Vector Spaces with Simple Language-Specific Rules

Morphologically rich languages accentuate two properties of distributional vector space models: 1) the difficulty of inducing accurate representations for low-frequency word forms; and 2) insensitivity to distinct lexical relations that…

Computation and Language · Computer Science 2017-06-02 Ivan Vulić , Nikola Mrkšić , Roi Reichart , Diarmuid Ó Séaghdha , Steve Young , Anna Korhonen

Morphological Segmentation Inside-Out

Morphological segmentation has traditionally been modeled with non-hierarchical models, which yield flat segmentations as output. In many cases, however, proper morphological analysis requires hierarchical structure -- especially in the…

Computation and Language · Computer Science 2021-02-16 Ryan Cotterell , Arun Kumar , Hinrich Schütze

Morphological Disambiguation of South S\'ami with FSTs and Neural Networks

We present a method for conducting morphological disambiguation for South S\'ami, which is an endangered language. Our method uses an FST-based morphological analyzer to produce an ambiguous set of morphological readings for each word in a…

Computation and Language · Computer Science 2020-04-30 Mika Hämäläinen , Linda Wiechetek