Related papers: Text normalization using memory augmented neural n…

RNN Approaches to Text Normalization: A Challenge

This paper presents a challenge to the community: given a large corpus of written text aligned to its normalized spoken form, train an RNN to learn the correct normalization function. We present a data set of general text where the…

Computation and Language · Computer Science 2017-01-26 Richard Sproat , Navdeep Jaitly

Improving Robustness of Neural Inverse Text Normalization via Data-Augmentation, Semi-Supervised Learning, and Post-Aligning Method

Inverse text normalization (ITN) is crucial for converting spoken-form into written-form, especially in the context of automatic speech recognition (ASR). While most downstream tasks of ASR rely on written-form, ASR systems often output…

Computation and Language · Computer Science 2023-09-19 Juntae Kim , Minkyu Lim , Seokjin Hong

A Chat About Boring Problems: Studying GPT-based text normalization

Text normalization - the conversion of text from written to spoken form - is traditionally assumed to be an ill-formed task for language models. In this work, we argue otherwise. We empirically show the capacity of Large-Language Models…

Computation and Language · Computer Science 2024-01-18 Yang Zhang , Travis M. Bartley , Mariana Graterol-Fuenmayor , Vitaly Lavrukhin , Evelina Bakhturina , Boris Ginsburg

Neural text normalization leveraging similarities of strings and sounds

We propose neural models that can normalize text by considering the similarities of word strings and sounds. We experimentally compared a model that considers the similarities of both word strings and sounds, a model that considers only the…

Computation and Language · Computer Science 2020-11-05 Riku Kawamura , Tatsuya Aoki , Hidetaka Kamigaito , Hiroya Takamura , Manabu Okumura

Improving historical spelling normalization with bi-directional LSTMs and multi-task learning

Natural-language processing of historical documents is complicated by the abundance of variant spellings and lack of annotated data. A common approach is to normalize the spelling of historical words to modern forms. We explore the…

Computation and Language · Computer Science 2016-10-26 Marcel Bollmann , Anders Søgaard

Normalizing Text using Language Modelling based on Phonetics and String Similarity

Social media networks and chatting platforms often use an informal version of natural text. Adversarial spelling attacks also tend to alter the input text by modifying the characters in the text. Normalizing these texts is an essential step…

Computation and Language · Computer Science 2020-06-26 Fenil Doshi , Jimit Gandhi , Deep Gosalia , Sudhir Bagul

Minimally Supervised Written-to-Spoken Text Normalization

In speech-applications such as text-to-speech (TTS) or automatic speech recognition (ASR), \emph{text normalization} refers to the task of converting from a \emph{written} representation into a representation of how the text is to be…

Computation and Language · Computer Science 2016-09-22 Ke Wu , Kyle Gorman , Richard Sproat

Utilizing Character and Word Embeddings for Text Normalization with Sequence-to-Sequence Models

Text normalization is an important enabling technology for several NLP tasks. Recently, neural-network-based approaches have outperformed well-established models in this task. However, in languages other than English, there has been little…

Computation and Language · Computer Science 2018-09-06 Daniel Watson , Nasser Zalmout , Nizar Habash

Adapting Sequence to Sequence models for Text Normalization in Social Media

Social media offer an abundant source of valuable raw data, however informal writing can quickly become a bottleneck for many natural language processing (NLP) tasks. Off-the-shelf tools are usually trained on formal text and cannot…

Computation and Language · Computer Science 2019-04-15 Ismini Lourentzou , Kabir Manghnani , ChengXiang Zhai

Language Agnostic Data-Driven Inverse Text Normalization

With the emergence of automatic speech recognition (ASR) models, converting the spoken form text (from ASR) to the written form is in urgent need. This inverse text normalization (ITN) problem attracts the attention of researchers from…

Computation and Language · Computer Science 2023-01-25 Szu-Jui Chen , Debjyoti Paul , Yutong Pang , Peng Su , Xuedong Zhang

DeepNorm-A Deep Learning Approach to Text Normalization

This paper presents an simple yet sophisticated approach to the challenge by Sproat and Jaitly (2016)- given a large corpus of written text aligned to its normalized spoken form, train an RNN to learn the correct normalization function.…

Computation and Language · Computer Science 2017-12-20 Maryam Zare , Shaurya Rohatgi

Adversarial Text Normalization

Text-based adversarial attacks are becoming more commonplace and accessible to general internet users. As these attacks proliferate, the need to address the gap in model robustness becomes imminent. While retraining on adversarial data may…

Computation and Language · Computer Science 2022-06-10 Joanna Bitton , Maya Pavlova , Ivan Evtimov

LSTM Neural Reordering Feature for Statistical Machine Translation

Artificial neural networks are powerful models, which have been widely applied into many aspects of machine translation, such as language modeling and translation modeling. Though notable improvements have been made in these areas, the…

Computation and Language · Computer Science 2017-09-25 Yiming Cui , Shijin Wang , Jianfeng Li

Text-To-Speech Conversion with Neural Networks: A Recurrent TDNN Approach

This paper describes the design of a neural network that performs the phonetic-to-acoustic mapping in a speech synthesis system. The use of a time-domain neural network architecture limits discontinuities that occur at phone boundaries.…

Neural and Evolutionary Computing · Computer Science 2016-08-31 Orhan Karaali , Gerald Corrigan , Ira Gerson , Noel Massey

An Experimental Study of LSTM Encoder-Decoder Model for Text Simplification

Text simplification (TS) aims to reduce the lexical and structural complexity of a text, while still retaining the semantic meaning. Current automatic TS techniques are limited to either lexical-level applications or manually defining a…

Computation and Language · Computer Science 2016-09-14 Tong Wang , Ping Chen , Kevin Amaral , Jipeng Qiang

Improving Data Driven Inverse Text Normalization using Data Augmentation

Inverse text normalization (ITN) is used to convert the spoken form output of an automatic speech recognition (ASR) system to a written form. Traditional handcrafted ITN rules can be complex to transcribe and maintain. Meanwhile neural…

Computation and Language · Computer Science 2022-07-21 Laxmi Pandey , Debjyoti Paul , Pooja Chitkara , Yutong Pang , Xuedong Zhang , Kjell Schubert , Mark Chou , Shu Liu , Yatharth Saraf

Regularizing and Optimizing LSTM Language Models

Recurrent neural networks (RNNs), such as long short-term memory networks (LSTMs), serve as a fundamental building block for many sequence learning tasks, including machine translation, language modeling, and question answering. In this…

Computation and Language · Computer Science 2017-08-09 Stephen Merity , Nitish Shirish Keskar , Richard Socher

On the Effectiveness of Neural Text Generation based Data Augmentation for Recognition of Morphologically Rich Speech

Advanced neural network models have penetrated Automatic Speech Recognition (ASR) in recent years, however, in language modeling many systems still rely on traditional Back-off N-gram Language Models (BNLM) partly or entirely. The reason…

Audio and Speech Processing · Electrical Eng. & Systems 2020-09-04 Balázs Tarján , György Szaszák , Tibor Fegyó , Péter Mihajlik

Sequence Learning with RNNs for Medical Concept Normalization in User-Generated Texts

In this work, we consider the medical concept normalization problem, i.e., the problem of mapping a disease mention in free-form text to a concept in a controlled vocabulary, usually to the standard thesaurus in the Unified Medical Language…

Computation and Language · Computer Science 2018-11-30 Elena Tutubalina , Zulfat Miftahutdinov , Sergey Nikolenko , Valentin Malykh

Neural Inverse Text Normalization

While there have been several contributions exploring state of the art techniques for text normalization, the problem of inverse text normalization (ITN) remains relatively unexplored. The best known approaches leverage finite state…

Computation and Language · Computer Science 2021-02-15 Monica Sunkara , Chaitanya Shivade , Sravan Bodapati , Katrin Kirchhoff