Related papers: Improving Data Driven Inverse Text Normalization u…

Improving Robustness of Neural Inverse Text Normalization via Data-Augmentation, Semi-Supervised Learning, and Post-Aligning Method

Inverse text normalization (ITN) is crucial for converting spoken-form into written-form, especially in the context of automatic speech recognition (ASR). While most downstream tasks of ASR rely on written-form, ASR systems often output…

Computation and Language · Computer Science 2023-09-19 Juntae Kim , Minkyu Lim , Seokjin Hong

Language Agnostic Data-Driven Inverse Text Normalization

With the emergence of automatic speech recognition (ASR) models, converting the spoken form text (from ASR) to the written form is in urgent need. This inverse text normalization (ITN) problem attracts the attention of researchers from…

Computation and Language · Computer Science 2023-01-25 Szu-Jui Chen , Debjyoti Paul , Yutong Pang , Peng Su , Xuedong Zhang

Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition

Automatic Speech Recognition (ASR) systems typically yield output in lexical form. However, humans prefer a written form output. To bridge this gap, ASR systems usually employ Inverse Text Normalization (ITN). In previous works, Weighted…

Computation and Language · Computer Science 2022-11-08 Yashesh Gaur , Nick Kibre , Jian Xue , Kangyuan Shu , Yuhui Wang , Issac Alphanso , Jinyu Li , Yifan Gong

NeMo Inverse Text Normalization: From Development To Production

Inverse text normalization (ITN) converts spoken-domain automatic speech recognition (ASR) output into written-domain text to improve the readability of the ASR output. Many state-of-the-art ITN systems use hand-written weighted…

Computation and Language · Computer Science 2021-05-18 Yang Zhang , Evelina Bakhturina , Kyle Gorman , Boris Ginsburg

Neural Inverse Text Normalization

While there have been several contributions exploring state of the art techniques for text normalization, the problem of inverse text normalization (ITN) remains relatively unexplored. The best known approaches leverage finite state…

Computation and Language · Computer Science 2021-02-15 Monica Sunkara , Chaitanya Shivade , Sravan Bodapati , Katrin Kirchhoff

A Unified Transformer-based Framework for Duplex Text Normalization

Text normalization (TN) and inverse text normalization (ITN) are essential preprocessing and postprocessing steps for text-to-speech synthesis and automatic speech recognition, respectively. Many methods have been proposed for either TN or…

Computation and Language · Computer Science 2021-08-24 Tuan Manh Lai , Yang Zhang , Evelina Bakhturina , Boris Ginsburg , Heng Ji

Text Generation with Speech Synthesis for ASR Data Augmentation

Aiming at reducing the reliance on expensive human annotations, data synthesis for Automatic Speech Recognition (ASR) has remained an active area of research. While prior work mainly focuses on synthetic speech generation for ASR data…

Computation and Language · Computer Science 2023-05-29 Zhuangqun Huang , Gil Keren , Ziran Jiang , Shashank Jain , David Goss-Grubbs , Nelson Cheng , Farnaz Abtahi , Duc Le , David Zhang , Antony D'Avirro , Ethan Campbell-Taylor , Jessie Salas , Irina-Elena Veliche , Xi Chen

Dynamic Context-Aware Streaming Pretrained Language Model For Inverse Text Normalization

Inverse Text Normalization (ITN) is crucial for converting spoken Automatic Speech Recognition (ASR) outputs into well-formatted written text, enhancing both readability and usability. Despite its importance, the integration of streaming…

Computation and Language · Computer Science 2025-06-02 Luong Ho , Khanh Le , Vinh Pham , Bao Nguyen , Tan Tran , Duc Chau

Thutmose Tagger: Single-pass neural model for Inverse Text Normalization

Inverse text normalization (ITN) is an essential post-processing step in automatic speech recognition (ASR). It converts numbers, dates, abbreviations, and other semiotic classes from the spoken form generated by ASR to their written forms.…

Computation and Language · Computer Science 2022-08-02 Alexandra Antonova , Evelina Bakhturina , Boris Ginsburg

Four-in-One: A Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech Recognition

Features such as punctuation, capitalization, and formatting of entities are important for readability, understanding, and natural language processing tasks. However, Automatic Speech Recognition (ASR) systems produce spoken-form text…

Computation and Language · Computer Science 2022-10-28 Sharman Tan , Piyush Behre , Nick Kibre , Issac Alphonso , Shuangyu Chang

Improving Accented Speech Recognition using Data Augmentation based on Unsupervised Text-to-Speech Synthesis

This paper investigates the use of unsupervised text-to-speech synthesis (TTS) as a data augmentation method to improve accented speech recognition. TTS systems are trained with a small amount of accented speech training data and their…

Computation and Language · Computer Science 2024-07-08 Cong-Thanh Do , Shuhei Imai , Rama Doddipatla , Thomas Hain

TTS-by-TTS: TTS-driven Data Augmentation for Fast and High-Quality Speech Synthesis

In this paper, we propose a text-to-speech (TTS)-driven data augmentation method for improving the quality of a non-autoregressive (AR) TTS system. Recently proposed non-AR models, such as FastSpeech 2, have successfully achieved fast…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-27 Min-Jae Hwang , Ryuichi Yamamoto , Eunwoo Song , Jae-Min Kim

Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation

Recently, end-to-end (E2E) automatic speech recognition (ASR) models have made great strides and exhibit excellent performance in general speech recognition. However, there remain several challenging scenarios that E2E models are not…

Computation and Language · Computer Science 2023-06-16 Zheng Liang , Zheshu Song , Ziyang Ma , Chenpeng Du , Kai Yu , Xie Chen

Towards Selection of Text-to-speech Data to Augment ASR Training

This paper presents a method for selecting appropriate synthetic speech samples from a given large text-to-speech (TTS) dataset as supplementary training data for an automatic speech recognition (ASR) model. We trained a neural network,…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-05 Shuo Liu , Leda Sarı , Chunyang Wu , Gil Keren , Yuan Shangguan , Jay Mahadeokar , Ozlem Kalinli

Text normalization using memory augmented neural networks

We perform text normalization, i.e. the transformation of words from the written to the spoken form, using a memory augmented neural network. With the addition of dynamic memory access and storage mechanism, we present a neural architecture…

Computation and Language · Computer Science 2019-04-05 Subhojeet Pramanik , Aman Hussain

ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion

We explore cross-lingual multi-speaker speech synthesis and cross-lingual voice conversion applied to data augmentation for automatic speech recognition (ASR) systems in low/medium-resource scenarios. Through extensive experiments, we show…

Audio and Speech Processing · Electrical Eng. & Systems 2023-05-23 Edresson Casanova , Christopher Shulby , Alexander Korolev , Arnaldo Candido Junior , Anderson da Silva Soares , Sandra Aluísio , Moacir Antonelli Ponti

On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR

We propose an on-the-fly data augmentation method for automatic speech recognition (ASR) that uses alignment information to generate effective training samples. Our method, called Aligned Data Augmentation (ADA) for ASR, replaces…

Computation and Language · Computer Science 2023-06-13 Tsz Kin Lam , Mayumi Ohta , Shigehiko Schamoni , Stefan Riezler

You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation

Data augmentation is one of the most effective ways to make end-to-end automatic speech recognition (ASR) perform close to the conventional hybrid approach, especially when dealing with low-resource tasks. Using recent advances in speech…

Audio and Speech Processing · Electrical Eng. & Systems 2020-12-21 Aleksandr Laptev , Roman Korostik , Aleksey Svischev , Andrei Andrusenko , Ivan Medennikov , Sergey Rybin

Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems

Recent advances in text-to-speech (TTS) led to the development of flexible multi-speaker end-to-end TTS systems. We extend state-of-the-art attention-based automatic speech recognition (ASR) systems with synthetic audio generated by a TTS…

Computation and Language · Computer Science 2020-02-18 Nick Rossenbach , Albert Zeyer , Ralf Schlüter , Hermann Ney

Make More of Your Data: Minimal Effort Data Augmentation for Automatic Speech Recognition and Translation

Data augmentation is a technique to generate new training data based on existing data. We evaluate the simple and cost-effective method of concatenating the original data examples to build new training instances. Continued training with…

Computation and Language · Computer Science 2023-06-12 Tsz Kin Lam , Shigehiko Schamoni , Stefan Riezler