English
Related papers

Related papers: Normalizing Text using Language Modelling based on…

200 papers

We propose neural models that can normalize text by considering the similarities of word strings and sounds. We experimentally compared a model that considers the similarities of both word strings and sounds, a model that considers only the…

Computation and Language · Computer Science 2020-11-05 Riku Kawamura , Tatsuya Aoki , Hidetaka Kamigaito , Hiroya Takamura , Manabu Okumura

Social media offer an abundant source of valuable raw data, however informal writing can quickly become a bottleneck for many natural language processing (NLP) tasks. Off-the-shelf tools are usually trained on formal text and cannot…

Computation and Language · Computer Science 2019-04-15 Ismini Lourentzou , Kabir Manghnani , ChengXiang Zhai

Text normalization is an essential task in the processing and analysis of social media that is dominated with informal writing. It aims to map informal words to their intended standard forms. Previously proposed text normalization…

Computation and Language · Computer Science 2017-12-29 Salman Ahmad Ansari , Usman Zafar , Asim Karim

Data augmentation is an effective technique for improving the performance of machine learning models. However, it has not been explored as extensively in natural language processing (NLP) as it has in computer vision. In this paper, we…

Computation and Language · Computer Science 2024-01-04 Himmet Toprak Kesgin , Mehmet Fatih Amasyali

Text normalization - the conversion of text from written to spoken form - is traditionally assumed to be an ill-formed task for language models. In this work, we argue otherwise. We empirically show the capacity of Large-Language Models…

Computation and Language · Computer Science 2024-01-18 Yang Zhang , Travis M. Bartley , Mariana Graterol-Fuenmayor , Vitaly Lavrukhin , Evelina Bakhturina , Boris Ginsburg

Current benchmark tasks for natural language processing contain text that is qualitatively different from the text used in informal day to day digital communication. This discrepancy has led to severe performance degradation of…

Computation and Language · Computer Science 2021-10-13 Ana-Maria Bucur , Adrian Cosma , Liviu P. Dinu

In this article, we introduce a set of methods to naturalize text based on natural human speech. Voice-based interactions provide a natural way of interfacing with electronic systems and are seeing a widespread adaptation of late. These…

Computation and Language · Computer Science 2020-11-10 Richa Sharma , Parth Vipul Shah , Ashwini M. Joshi

We perform text normalization, i.e. the transformation of words from the written to the spoken form, using a memory augmented neural network. With the addition of dynamic memory access and storage mechanism, we present a neural architecture…

Computation and Language · Computer Science 2019-04-05 Subhojeet Pramanik , Aman Hussain

The ubiquity of the contemporary language understanding tasks gives relevance to the development of generalized, yet highly efficient models that utilize all knowledge, provided by the data source. In this work, we present SocialBERT - the…

Computation and Language · Computer Science 2021-11-16 Ilia Karpov , Nick Kartashev

The proliferation of hate speech on social media platforms has necessitated the development of effective detection and moderation tools. This study evaluates the efficacy of various machine learning models in identifying hate speech and…

Computation and Language · Computer Science 2026-02-25 Saurabh Mishra , Shivani Thakur , Radhika Mamidi

Text-based adversarial attacks are becoming more commonplace and accessible to general internet users. As these attacks proliferate, the need to address the gap in model robustness becomes imminent. While retraining on adversarial data may…

Computation and Language · Computer Science 2022-06-10 Joanna Bitton , Maya Pavlova , Ivan Evtimov

The absence of standardized spelling conventions and the organic evolution of human language present an inherent linguistic challenge within historical documents, a longstanding concern for scholars in the humanities. Addressing this issue,…

Computation and Language · Computer Science 2025-07-01 Miguel Domingo , Francisco Casacuberta

Pre-trained models are widely used in the tasks of natural language processing nowadays. However, in the specific field of text simplification, the research on improving pre-trained models is still blank. In this work, we propose a…

Computation and Language · Computer Science 2022-04-19 Renliang Sun , Xiaojun Wan

This paper presents a challenge to the community: given a large corpus of written text aligned to its normalized spoken form, train an RNN to learn the correct normalization function. We present a data set of general text where the…

Computation and Language · Computer Science 2017-01-26 Richard Sproat , Navdeep Jaitly

In the digital age of today, the internet has become an indispensable platform for people's lives, work, and information exchange. However, the problem of violent text proliferation in the network environment has arisen, which has brought…

Computation and Language · Computer Science 2024-12-24 Yongsheng Yang , Xiaoying Wang

Recent advances in pre-trained language modeling have facilitated significant progress across various natural language processing (NLP) tasks. Word masking during model training constitutes a pivotal component of language modeling in…

Computation and Language · Computer Science 2024-02-27 Anas Belfathi , Ygor Gallina , Nicolas Hernandez , Richard Dufour , Laura Monceaux

Previous studies have shown that health reports in social media, such as DailyStrength and Twitter, have potential for monitoring health conditions (e.g. adverse drug reactions, infectious diseases) in particular communities. However, in…

Computation and Language · Computer Science 2015-08-11 Nut Limsopatham , Nigel Collier

We present two novel unsupervised methods for eliminating toxicity in text. Our first method combines two recent ideas: (1) guidance of the generation process with small style-conditional language models and (2) use of paraphrasing models…

Computation and Language · Computer Science 2021-11-04 David Dale , Anton Voronov , Daryna Dementieva , Varvara Logacheva , Olga Kozlova , Nikita Semenov , Alexander Panchenko

Historic variations of spelling poses a challenge for full-text search or natural language processing on historical digitized texts. To minimize the gap between the historic orthography and contemporary spelling, usually an automatic…

Computation and Language · Computer Science 2025-02-26 Anton Ehrmanntraut

User-generated content published on microblogging social networks constitutes a priceless source of information. However, microtexts usually deviate from the standard lexical and grammatical rules of the language, thus making its processing…

Computation and Language · Computer Science 2024-02-06 Yerai Doval , Manuel Vilares , Jesús Vilares
‹ Prev 1 2 3 10 Next ›