English
Related papers

Related papers: Word segmentation granularity in Korean

200 papers

We provide a detailed overview of various approaches to word segmentation of Asian Languages, specifically Chinese, Korean, and Japanese languages. For each language, approaches to deal with word segmentation differs. We also include our…

Computation and Language · Computer Science 2024-07-30 Matthew Rho , Yexin Tian , Qin Chen

This paper attempts to analyze the Korean sentence classification system for a chatbot. Sentence classification is the task of classifying an input sentence based on predefined categories. However, spelling or space error contained in the…

Computation and Language · Computer Science 2021-06-08 DongHyun Choi , IlNam Park , Myeong Cheol Shin , EungGyun Kim , Dong Ryeol Shin

While most of the speech and natural language systems which were developed for English and other Indo-European languages neglect the morphological processing and integrate speech and natural language at the word level, for the agglutinative…

cmp-lg · Computer Science 2008-02-03 WonIl Lee , Geunbae Lee , Jong-Hyeok Lee

For readability and disambiguation of the written text, appropriate word segmentation is recommended for documentation, and it also holds for the digitized texts. If the language is agglutinative while far from scriptio continua, for…

Computation and Language · Computer Science 2021-05-05 Won Ik Cho , Sung Jun Cheon , Woo Hyun Kang , Ji Won Kim , Nam Soo Kim

Most of the post-processing methods for character recognition rely on contextual information of character and word-fragment levels. However, due to linguistic characteristics of Korean, such low-level information alone is not sufficient for…

cmp-lg · Computer Science 2008-02-03 Geunbae Lee , Jong-Hyeok Lee , JinHee Yoo

The design of Korean constituency treebanks raises a fundamental representational question concerning the choice of terminal units. Although Korean words are morphologically complex, treating morphemes as constituency terminals conflates…

Computation and Language · Computer Science 2025-12-30 Jungyeul Park , Chulwoo Park

In this study, we introduce KOPL, a novel framework for handling Korean OOV words with Phoneme representation Learning. Our work is based on the linguistic property of Korean as a phonemic script, the high correlation between phonemes and…

Computation and Language · Computer Science 2025-07-08 Nayeon Kim , Eojin Jeon , Jun-Hyung Park , SangKeun Lee

This article describes an exclusively resource-based method of morphological annotation of written Korean text. Korean is an agglutinative language. Our annotator is designed to process text before the operation of a syntactic parser. In…

Computation and Language · Computer Science 2007-11-22 Ivan Berlocher , Hyun-Gue Huh , Eric Laporte , Jee-Sun Nam

We present in this work a new Universal Morphology dataset for Korean. Previously, the Korean language has been underrepresented in the field of morphological paradigms amongst hundreds of diverse world languages. Hence, we propose this…

Computation and Language · Computer Science 2023-05-18 Eunkyul Leah Jo , Kyuwon Kim , Xihan Wu , KyungTae Lim , Jungyeul Park , Chulwoo Park

Different from the writing systems of many Romance and Germanic languages, some languages or language families show complex conjunct forms in character composition. For such cases where the conjuncts consist of the components representing…

Computation and Language · Computer Science 2019-09-20 Won Ik Cho , Seok Min Kim , Nam Soo Kim

Word embedding has become a fundamental component to many NLP tasks such as named entity recognition and machine translation. However, popular models that learn such embeddings are unaware of the morphology of words, so it is not directly…

Computation and Language · Computer Science 2017-08-08 Sanghyuk Choi , Taeuk Kim , Jinseok Seol , Sang-goo Lee

A new tightly coupled speech and natural language integration model is presented for a TDNN-based continuous possibly large vocabulary speech recognition system for Korean. Unlike popular n-best techniques developed for integrating mainly…

cmp-lg · Computer Science 2008-02-03 Geunbae Lee , Jong-Hyeok Lee

Word segmentation is the first step of any tasks in Vietnamese language processing. This paper reviews stateof-the-art approaches and systems for word segmentation in Vietnamese. To have an overview of all stages from building corpora to…

Computation and Language · Computer Science 2019-06-19 Song Nguyen Duc Cong , Quoc Hung Ngo , Rachsuda Jiamthapthaksin

Due to the fact that Korean is a highly agglutinative, character-rich language, previous work on Korean morphological analysis typically employs the use of sub-character features known as graphemes or otherwise utilizes comprehensive prior…

Computation and Language · Computer Science 2018-06-29 Andrew Matteson , Chanhee Lee , Young-Bum Kim , Heuiseok Lim

We present in this paper a novel framework for morpheme segmentation which uses the morpho-syntactic regularities preserved by word representations, in addition to orthographic features, to segment words into morphemes. This framework is…

Computation and Language · Computer Science 2017-05-02 Tarek Sakakini , Suma Bhat , Pramod Viswanath

Recent advancements in morpheme segmentation primarily emphasize word-level segmentation, often neglecting the contextual relevance within the sentence. In this study, we redefine the morpheme segmentation task as a sequence-to-sequence…

Computation and Language · Computer Science 2024-12-18 Prabin Bhandari , Abhishek Paudel

Intention identification is a core issue in dialog management. However, due to the non-canonicality of the spoken language, it is difficult to extract the content automatically from the conversation-style utterances. This is much more…

Computation and Language · Computer Science 2019-07-10 Won Ik Cho , Young Ki Moon , Woo Hyun Kang , Nam Soo Kim

Given the lack of word delimiters in written Japanese, word segmentation is generally considered a crucial first step in processing Japanese texts. Typical Japanese segmentation algorithms rely either on a lexicon and syntactic analysis or…

Computation and Language · Computer Science 2007-05-23 Rie Kubota Ando , Lillian Lee

We describe a resource-based method of morphological annotation of written Korean text. Korean is an agglutinative language. The output of our system is a graph of morphemes annotated with accurate linguistic information. The language…

Computation and Language · Computer Science 2007-11-22 Hyun-Gue Huh , Eric Laporte

Khmer text is written from left to right with optional space. Space is not served as a word boundary but instead, it is used for readability or other functional purposes. Word segmentation is a prior step for downstream tasks such as…

Computation and Language · Computer Science 2021-04-01 Rina Buoy , Nguonly Taing , Sokchea Kor
‹ Prev 1 2 3 10 Next ›