English
Related papers

Related papers: Morphological Disambiguation from Stemming Data

200 papers

Agglutinative languages such as Turkish, Finnish and Hungarian require morphological disambiguation before further processing due to the complex morphology of words. A morphological disambiguator is used to select the correct morphological…

Computation and Language · Computer Science 2017-02-14 Eray Yildiz , Caglar Tirkaz , H. Bahadir Sahin , Mustafa Tolga Eren , Ozan Sonmez

This thesis presents a constraint-based morphological disambiguation approach that is applicable to languages with complex morphology--specifically agglutinative languages with productive inflectional and derivational morphological…

cmp-lg · Computer Science 2008-02-03 Gokhan Tur

Pre-trained language models such as BERT have been successful at tackling many natural language processing tasks. However, the unsupervised sub-word tokenization methods commonly used in these models (e.g., byte-pair encoding - BPE) are…

Computation and Language · Computer Science 2023-04-26 Antoine Nzeyimana , Andre Niyongabo Rubungo

We consider the problem of disambiguating the lemma and part of speech of ambiguous words in morphologically rich languages. We propose a method for disambiguating ambiguous words in context, using a large un-annotated corpus of text, and a…

Computation and Language · Computer Science 2020-07-14 José María Hoya Quecedo , Maximilian W. Koppatz , Giacomo Furlan , Roman Yangarber

We present a constraint-based morphological disambiguation system in which individual constraints vote on matching morphological parses, and disambiguation of all the tokens in a sentence is performed at the end by selecting parses that…

cmp-lg · Computer Science 2016-08-31 Kemal Oflazer , Gokhan Tur

Automatic morphological processing can aid downstream natural language processing applications, especially for low-resource languages, and assist language documentation efforts for endangered languages. Having long been multilingual, the…

Computation and Language · Computer Science 2022-03-18 Adam Wiemerslage , Miikka Silfverberg , Changbing Yang , Arya D. McCarthy , Garrett Nicolai , Eliana Colunga , Katharina Kann

This paper presents a constraint-based morphological disambiguation approach that is applicable languages with complex morphology--specifically agglutinative languages with productive inflectional and derivational morphological phenomena.…

cmp-lg · Computer Science 2008-02-03 Kemal Oflazer , Gokhan Tur

Bangla, the seventh most widely spoken language worldwide with 300 million native speakers, faces digital under-representation due to limited resources and lack of annotated datasets. Stemming, a critical preprocessing step in language…

Computation and Language · Computer Science 2025-08-22 Abhijit Paul , Mashiat Amin Farin , Sharif Md. Abdullah , Ahmedul Kabir , Zarif Masud , Shebuti Rayana

This study proposes a method to develop neural models of the morphological analyzer for Japanese Hiragana sentences using the Bi-LSTM CRF model. Morphological analysis is a technique that divides text data into words and assigns information…

Computation and Language · Computer Science 2022-01-11 Jun Izutsu , Kanako Komiya

Polysynthetic languages have exceptionally large and sparse vocabularies, thanks to the number of morpheme slots and combinations in a word. This complexity, together with a general scarcity of written data, poses a challenge to the…

Computation and Language · Computer Science 2020-05-05 William Lane , Steven Bird

In this paper we present a lexicon-based approach to the problem of morphological processing. Full-form words, lemmas and grammatical tags are interconnected in a DAWG. Thus, the process of analysis/synthesis is reduced to a search in the…

Computation and Language · Computer Science 2007-05-23 Kyriakos N. Sgarbas , Nikos D. Fakotakis , George K. Kokkinakis

Data augmentation techniques are widely used in low-resource automatic morphological inflection to overcome data sparsity. However, the full implications of these techniques remain poorly understood. In this study, we aim to shed light on…

Computation and Language · Computer Science 2023-10-25 Farhan Samir , Miikka Silfverberg

Canonical morphological segmentation is the process of analyzing words into the standard (aka underlying) forms of their constituent morphemes. This is a core task in language documentation, and NLP systems have the potential to…

Computation and Language · Computer Science 2024-10-16 Enora Rice , Ali Marashian , Luke Gessler , Alexis Palmer , Katharina von der Wense

We propose a novel approach to translating from a morphologically complex language. Unlike previous research, which has targeted word inflections and concatenations, we focus on the pairwise relationship between morphologically related…

Computation and Language · Computer Science 2021-09-29 Preslav Nakov , Hwee Tou Ng

This paper explores morpho-syntactic ambiguities for French to develop a strategy for part-of-speech disambiguation that a) reflects the complexity of French as an inflected language, b) optimizes the estimation of probabilities, c) allows…

cmp-lg · Computer Science 2007-05-23 Evelyne Tzoukermann , Dragomir R. Radev , William A. Gale

Previous studies have shown that linguistic features of a word such as possession, genitive or other grammatical cases can be employed in word representations of a named entity recognition (NER) tagger to improve the performance for…

Computation and Language · Computer Science 2019-11-12 Onur Güngör , Suzan Üsküdarlı , Tunga Güngör

Morphological analysis is an important subtask in text-to-speech conversion, hyphenation, and other language engineering tasks. The traditional approach to performing morphological analysis is to combine a morpheme lexicon, sets of…

cmp-lg · Computer Science 2008-02-03 Antal van den Bosch , Walter Daelemans , Ton Weijters

Morphologically rich languages accentuate two properties of distributional vector space models: 1) the difficulty of inducing accurate representations for low-frequency word forms; and 2) insensitivity to distinct lexical relations that…

Computation and Language · Computer Science 2017-06-02 Ivan Vulić , Nikola Mrkšić , Roi Reichart , Diarmuid Ó Séaghdha , Steve Young , Anna Korhonen

Morphological segmentation has traditionally been modeled with non-hierarchical models, which yield flat segmentations as output. In many cases, however, proper morphological analysis requires hierarchical structure -- especially in the…

Computation and Language · Computer Science 2021-02-16 Ryan Cotterell , Arun Kumar , Hinrich Schütze

We present a method for conducting morphological disambiguation for South S\'ami, which is an endangered language. Our method uses an FST-based morphological analyzer to produce an ambiguous set of morphological readings for each word in a…

Computation and Language · Computer Science 2020-04-30 Mika Hämäläinen , Linda Wiechetek
‹ Prev 1 2 3 10 Next ›