English
Related papers

Related papers: Sequence-to-Sequence Lexical Normalization with Mu…

200 papers

Social media offer an abundant source of valuable raw data, however informal writing can quickly become a bottleneck for many natural language processing (NLP) tasks. Off-the-shelf tools are usually trained on formal text and cannot…

Computation and Language · Computer Science 2019-04-15 Ismini Lourentzou , Kabir Manghnani , ChengXiang Zhai

We define multilevel text normalization as sequence-to-sequence processing that transforms naturally noisy text into a sequence of normalized units of meaning (morphemes) in three steps: 1) writing normalization, 2) lemmatization, 3)…

Computation and Language · Computer Science 2019-04-01 Tatyana Ruzsics , Tanja Samardžić

Text normalization is an important enabling technology for several NLP tasks. Recently, neural-network-based approaches have outperformed well-established models in this task. However, in languages other than English, there has been little…

Computation and Language · Computer Science 2018-09-06 Daniel Watson , Nasser Zalmout , Nizar Habash

Social media networks and chatting platforms often use an informal version of natural text. Adversarial spelling attacks also tend to alter the input text by modifying the characters in the text. Normalizing these texts is an essential step…

Computation and Language · Computer Science 2020-06-26 Fenil Doshi , Jimit Gandhi , Deep Gosalia , Sudhir Bagul

Social media data has been of interest to Natural Language Processing (NLP) practitioners for over a decade, because of its richness in information, but also challenges for automatic processing. Since language use is more informal,…

Sequence-to-sequence learning with neural networks has become the de facto standard for sequence prediction tasks. This approach typically models the local distribution over the next word with a powerful neural network that can condition on…

Computation and Language · Computer Science 2021-11-17 Yoon Kim

Real-world NLP applications often deal with nonstandard text (e.g., dialectal, informal, or misspelled text). However, language models like BERT deteriorate in the face of dialect variation or noise. How do we push BERT's modeling…

Computation and Language · Computer Science 2023-11-02 Aarohi Srivastava , David Chiang

Sequence-to-sequence transduction is the core problem in language processing applications as diverse as semantic parsing, machine translation, and instruction following. The neural network models that provide the dominant solution to these…

Computation and Language · Computer Science 2021-06-09 Ekin Akyürek , Jacob Andreas

What can pre-trained multilingual sequence-to-sequence models like mBART contribute to translating low-resource languages? We conduct a thorough empirical experiment in 10 languages to ascertain this, considering five factors: (1) the…

Natural Language Processing (NLP) has witnessed a transformative leap with the advent of transformer-based architectures, which have significantly enhanced the ability of machines to understand and generate human-like text. This paper…

Computation and Language · Computer Science 2025-03-27 Tianhao Wu , Yu Wang , Ngoc Quach

Social media data is a valuable resource for research, yet it contains a wide range of non-standard words (NSW). These irregularities hinder the effective operation of NLP tools. Current state-of-the-art methods for the Vietnamese language…

Computation and Language · Computer Science 2024-07-26 Anh Thi-Hoang Nguyen , Dung Ha Nguyen , Nguyet Thi Nguyen , Khanh Thanh-Duy Ho , Kiet Van Nguyen

Lexical normalisation (LN) is the process of correcting each word in a dataset to its canonical form so that it may be more easily and more accurately analysed. Most lexical normalisation systems operate at the character-level, while…

Computation and Language · Computer Science 2019-11-15 Michael Stewart , Wei Liu , Rachel Cardell-Oliver

Text normalization - the conversion of text from written to spoken form - is traditionally assumed to be an ill-formed task for language models. In this work, we argue otherwise. We empirically show the capacity of Large-Language Models…

Computation and Language · Computer Science 2024-01-18 Yang Zhang , Travis M. Bartley , Mariana Graterol-Fuenmayor , Vitaly Lavrukhin , Evelina Bakhturina , Boris Ginsburg

We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. It uses a standard…

Computation and Language · Computer Science 2019-10-31 Mike Lewis , Yinhan Liu , Naman Goyal , Marjan Ghazvininejad , Abdelrahman Mohamed , Omer Levy , Ves Stoyanov , Luke Zettlemoyer

The task of linearization is to find a grammatical order given a set of words. Traditional models use statistical methods. Syntactic linearization systems, which generate a sentence along with its syntactic tree, have shown state-of-the-art…

Computation and Language · Computer Science 2018-10-24 Linfeng Song , Yue Zhang , Daniel Gildea

Lemmatization of standard languages is concerned with (i) abstracting over morphological differences and (ii) resolving token-lemma ambiguities of inflected words in order to map them to a dictionary headword. In the present paper we aim to…

Computation and Language · Computer Science 2019-03-19 Enrique Manjavacas , Ákos Kádár , Mike Kestemont

The ability of semantic reasoning over the sentence pair is essential for many natural language understanding tasks, e.g., natural language inference and machine reading comprehension. A recent significant improvement in these tasks comes…

Computation and Language · Computer Science 2021-06-18 Weidi Xu , Xingyi Cheng , Kunlong Chen , Wei Wang , Bin Bi , Ming Yan , Chen Wu , Luo Si , Wei Chu , Taifeng Wang

Without real bilingual corpus available, unsupervised Neural Machine Translation (NMT) typically requires pseudo parallel data generated with the back-translation method for the model training. However, due to weak supervision, the pseudo…

Computation and Language · Computer Science 2019-01-15 Shuo Ren , Zhirui Zhang , Shujie Liu , Ming Zhou , Shuai Ma

Summarization of long-form text data is a problem especially pertinent in knowledge economy jobs such as medicine and finance, that require continuously remaining informed on a sophisticated and evolving body of knowledge. As such,…

Computation and Language · Computer Science 2022-04-22 Brydon Parker , Alik Sokolov , Mahtab Ahmed , Matt Kalebic , Sedef Akinli Kocak , Ofer Shai

This paper proposes a sequence-to-sequence learning approach for Arabic pronoun resolution, which explores the effectiveness of using advanced natural language processing (NLP) techniques, specifically Bi-LSTM and the BERT pre-trained…

Computation and Language · Computer Science 2023-05-22 Hanan S. Murayshid , Hafida Benhidour , Said Kerrache
‹ Prev 1 2 3 10 Next ›