Related papers: Language Agnostic Data-Driven Inverse Text Normali…

Improving Data Driven Inverse Text Normalization using Data Augmentation

Inverse text normalization (ITN) is used to convert the spoken form output of an automatic speech recognition (ASR) system to a written form. Traditional handcrafted ITN rules can be complex to transcribe and maintain. Meanwhile neural…

Computation and Language · Computer Science 2022-07-21 Laxmi Pandey , Debjyoti Paul , Pooja Chitkara , Yutong Pang , Xuedong Zhang , Kjell Schubert , Mark Chou , Shu Liu , Yatharth Saraf

Improving Robustness of Neural Inverse Text Normalization via Data-Augmentation, Semi-Supervised Learning, and Post-Aligning Method

Inverse text normalization (ITN) is crucial for converting spoken-form into written-form, especially in the context of automatic speech recognition (ASR). While most downstream tasks of ASR rely on written-form, ASR systems often output…

Computation and Language · Computer Science 2023-09-19 Juntae Kim , Minkyu Lim , Seokjin Hong

Dynamic Context-Aware Streaming Pretrained Language Model For Inverse Text Normalization

Inverse Text Normalization (ITN) is crucial for converting spoken Automatic Speech Recognition (ASR) outputs into well-formatted written text, enhancing both readability and usability. Despite its importance, the integration of streaming…

Computation and Language · Computer Science 2025-06-02 Luong Ho , Khanh Le , Vinh Pham , Bao Nguyen , Tan Tran , Duc Chau

Neural Inverse Text Normalization

While there have been several contributions exploring state of the art techniques for text normalization, the problem of inverse text normalization (ITN) remains relatively unexplored. The best known approaches leverage finite state…

Computation and Language · Computer Science 2021-02-15 Monica Sunkara , Chaitanya Shivade , Sravan Bodapati , Katrin Kirchhoff

Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition

Automatic Speech Recognition (ASR) systems typically yield output in lexical form. However, humans prefer a written form output. To bridge this gap, ASR systems usually employ Inverse Text Normalization (ITN). In previous works, Weighted…

Computation and Language · Computer Science 2022-11-08 Yashesh Gaur , Nick Kibre , Jian Xue , Kangyuan Shu , Yuhui Wang , Issac Alphanso , Jinyu Li , Yifan Gong

NeMo Inverse Text Normalization: From Development To Production

Inverse text normalization (ITN) converts spoken-domain automatic speech recognition (ASR) output into written-domain text to improve the readability of the ASR output. Many state-of-the-art ITN systems use hand-written weighted…

Computation and Language · Computer Science 2021-05-18 Yang Zhang , Evelina Bakhturina , Kyle Gorman , Boris Ginsburg

Text normalization using memory augmented neural networks

We perform text normalization, i.e. the transformation of words from the written to the spoken form, using a memory augmented neural network. With the addition of dynamic memory access and storage mechanism, we present a neural architecture…

Computation and Language · Computer Science 2019-04-05 Subhojeet Pramanik , Aman Hussain

Thutmose Tagger: Single-pass neural model for Inverse Text Normalization

Inverse text normalization (ITN) is an essential post-processing step in automatic speech recognition (ASR). It converts numbers, dates, abbreviations, and other semiotic classes from the spoken form generated by ASR to their written forms.…

Computation and Language · Computer Science 2022-08-02 Alexandra Antonova , Evelina Bakhturina , Boris Ginsburg

Four-in-One: A Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech Recognition

Features such as punctuation, capitalization, and formatting of entities are important for readability, understanding, and natural language processing tasks. However, Automatic Speech Recognition (ASR) systems produce spoken-form text…

Computation and Language · Computer Science 2022-10-28 Sharman Tan , Piyush Behre , Nick Kibre , Issac Alphonso , Shuangyu Chang

A Unified Transformer-based Framework for Duplex Text Normalization

Text normalization (TN) and inverse text normalization (ITN) are essential preprocessing and postprocessing steps for text-to-speech synthesis and automatic speech recognition, respectively. Many methods have been proposed for either TN or…

Computation and Language · Computer Science 2021-08-24 Tuan Manh Lai , Yang Zhang , Evelina Bakhturina , Boris Ginsburg , Heng Ji

Almost Unsupervised Text to Speech and Automatic Speech Recognition

Text to speech (TTS) and automatic speech recognition (ASR) are two dual tasks in speech processing and both achieve impressive performance thanks to the recent advance in deep learning and large amount of aligned speech and text data.…

Audio and Speech Processing · Electrical Eng. & Systems 2020-07-28 Yi Ren , Xu Tan , Tao Qin , Sheng Zhao , Zhou Zhao , Tie-Yan Liu

Bootstrap an end-to-end ASR system by multilingual training, transfer learning, text-to-text mapping and synthetic audio

Bootstrapping speech recognition on limited data resources has been an area of active research for long. The recent transition to all-neural models and end-to-end (E2E) training brought along particular challenges as these models are known…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-21 Manuel Giollo , Deniz Gunceler , Yulan Liu , Daniel Willett

Language-agnostic Multilingual Modeling

Multilingual Automated Speech Recognition (ASR) systems allow for the joint training of data-rich and data-scarce languages in a single model. This enables data and parameter sharing across languages, which is especially beneficial for the…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-22 Arindrima Datta , Bhuvana Ramabhadran , Jesse Emond , Anjuli Kannan , Brian Roark

On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition

The rapid development of neural text-to-speech (TTS) systems enabled its usage in other areas of natural language processing such as automatic speech recognition (ASR) or spoken language translation (SLT). Due to the large number of…

Computation and Language · Computer Science 2024-08-01 Nick Rossenbach , Ralf Schlüter , Sakriani Sakti

Text-only adaptation in LLM-based ASR through text denoising

Adapting large language model (LLM)-based automatic speech recognition (ASR) systems to new domains using text-only data is a significant yet underexplored challenge. Standard fine-tuning of the LLM on the target domain text often disrupts…

Sound · Computer Science 2026-03-13 Andrés Carofilis , Sergio Burdisso , Esaú Villatoro-Tello , Shashi Kumar , Kadri Hacioglu , Srikanth Madikeri , Pradeep Rangappa , Manjunath K E , Petr Motlicek , Shankar Venkatesan , Andreas Stolcke

Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap

While automatic speech recognition (ASR) systems have achieved remarkable performance with large-scale datasets, their efficacy remains inadequate in low-resource settings, encompassing dialects, accents, minority languages, and long-tail…

Audio and Speech Processing · Electrical Eng. & Systems 2024-10-23 Guanrou Yang , Fan Yu , Ziyang Ma , Zhihao Du , Zhifu Gao , Shiliang Zhang , Xie Chen

Towards Unsupervised Automatic Speech Recognition Trained by Unaligned Speech and Text only

Automatic speech recognition (ASR) has been widely researched with supervised approaches, while many low-resourced languages lack audio-text aligned data, and supervised methods cannot be applied on them. In this work, we propose a…

Computation and Language · Computer Science 2018-08-14 Yi-Chen Chen , Chia-Hao Shen , Sung-Feng Huang , Hung-yi Lee

Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features

While neural text-to-speech systems perform remarkably well in high-resource scenarios, they cannot be applied to the majority of the over 6,000 spoken languages in the world due to a lack of appropriate training data. In this work, we use…

Computation and Language · Computer Science 2022-03-08 Florian Lux , Ngoc Thang Vu

Using Synthetic Audio to Improve The Recognition of Out-Of-Vocabulary Words in End-To-End ASR Systems

Today, many state-of-the-art automatic speech recognition (ASR) systems apply all-neural models that map audio to word sequences trained end-to-end along one global optimisation criterion in a fully data driven fashion. These models allow…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-11 Xianrui Zheng , Yulan Liu , Deniz Gunceler , Daniel Willett

RNN Approaches to Text Normalization: A Challenge

This paper presents a challenge to the community: given a large corpus of written text aligned to its normalized spoken form, train an RNN to learn the correct normalization function. We present a data set of general text where the…

Computation and Language · Computer Science 2017-01-26 Richard Sproat , Navdeep Jaitly