English
Related papers

Related papers: Language Agnostic Data-Driven Inverse Text Normali…

200 papers

Inverse text normalization (ITN) is used to convert the spoken form output of an automatic speech recognition (ASR) system to a written form. Traditional handcrafted ITN rules can be complex to transcribe and maintain. Meanwhile neural…

Computation and Language · Computer Science 2022-07-21 Laxmi Pandey , Debjyoti Paul , Pooja Chitkara , Yutong Pang , Xuedong Zhang , Kjell Schubert , Mark Chou , Shu Liu , Yatharth Saraf

Inverse text normalization (ITN) is crucial for converting spoken-form into written-form, especially in the context of automatic speech recognition (ASR). While most downstream tasks of ASR rely on written-form, ASR systems often output…

Computation and Language · Computer Science 2023-09-19 Juntae Kim , Minkyu Lim , Seokjin Hong

Inverse Text Normalization (ITN) is crucial for converting spoken Automatic Speech Recognition (ASR) outputs into well-formatted written text, enhancing both readability and usability. Despite its importance, the integration of streaming…

Computation and Language · Computer Science 2025-06-02 Luong Ho , Khanh Le , Vinh Pham , Bao Nguyen , Tan Tran , Duc Chau

While there have been several contributions exploring state of the art techniques for text normalization, the problem of inverse text normalization (ITN) remains relatively unexplored. The best known approaches leverage finite state…

Computation and Language · Computer Science 2021-02-15 Monica Sunkara , Chaitanya Shivade , Sravan Bodapati , Katrin Kirchhoff

Automatic Speech Recognition (ASR) systems typically yield output in lexical form. However, humans prefer a written form output. To bridge this gap, ASR systems usually employ Inverse Text Normalization (ITN). In previous works, Weighted…

Computation and Language · Computer Science 2022-11-08 Yashesh Gaur , Nick Kibre , Jian Xue , Kangyuan Shu , Yuhui Wang , Issac Alphanso , Jinyu Li , Yifan Gong

Inverse text normalization (ITN) converts spoken-domain automatic speech recognition (ASR) output into written-domain text to improve the readability of the ASR output. Many state-of-the-art ITN systems use hand-written weighted…

Computation and Language · Computer Science 2021-05-18 Yang Zhang , Evelina Bakhturina , Kyle Gorman , Boris Ginsburg

We perform text normalization, i.e. the transformation of words from the written to the spoken form, using a memory augmented neural network. With the addition of dynamic memory access and storage mechanism, we present a neural architecture…

Computation and Language · Computer Science 2019-04-05 Subhojeet Pramanik , Aman Hussain

Inverse text normalization (ITN) is an essential post-processing step in automatic speech recognition (ASR). It converts numbers, dates, abbreviations, and other semiotic classes from the spoken form generated by ASR to their written forms.…

Computation and Language · Computer Science 2022-08-02 Alexandra Antonova , Evelina Bakhturina , Boris Ginsburg

Features such as punctuation, capitalization, and formatting of entities are important for readability, understanding, and natural language processing tasks. However, Automatic Speech Recognition (ASR) systems produce spoken-form text…

Computation and Language · Computer Science 2022-10-28 Sharman Tan , Piyush Behre , Nick Kibre , Issac Alphonso , Shuangyu Chang

Text normalization (TN) and inverse text normalization (ITN) are essential preprocessing and postprocessing steps for text-to-speech synthesis and automatic speech recognition, respectively. Many methods have been proposed for either TN or…

Computation and Language · Computer Science 2021-08-24 Tuan Manh Lai , Yang Zhang , Evelina Bakhturina , Boris Ginsburg , Heng Ji

Text to speech (TTS) and automatic speech recognition (ASR) are two dual tasks in speech processing and both achieve impressive performance thanks to the recent advance in deep learning and large amount of aligned speech and text data.…

Audio and Speech Processing · Electrical Eng. & Systems 2020-07-28 Yi Ren , Xu Tan , Tao Qin , Sheng Zhao , Zhou Zhao , Tie-Yan Liu

Bootstrapping speech recognition on limited data resources has been an area of active research for long. The recent transition to all-neural models and end-to-end (E2E) training brought along particular challenges as these models are known…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-21 Manuel Giollo , Deniz Gunceler , Yulan Liu , Daniel Willett

Multilingual Automated Speech Recognition (ASR) systems allow for the joint training of data-rich and data-scarce languages in a single model. This enables data and parameter sharing across languages, which is especially beneficial for the…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-22 Arindrima Datta , Bhuvana Ramabhadran , Jesse Emond , Anjuli Kannan , Brian Roark

The rapid development of neural text-to-speech (TTS) systems enabled its usage in other areas of natural language processing such as automatic speech recognition (ASR) or spoken language translation (SLT). Due to the large number of…

Computation and Language · Computer Science 2024-08-01 Nick Rossenbach , Ralf Schlüter , Sakriani Sakti

Adapting large language model (LLM)-based automatic speech recognition (ASR) systems to new domains using text-only data is a significant yet underexplored challenge. Standard fine-tuning of the LLM on the target domain text often disrupts…

While automatic speech recognition (ASR) systems have achieved remarkable performance with large-scale datasets, their efficacy remains inadequate in low-resource settings, encompassing dialects, accents, minority languages, and long-tail…

Audio and Speech Processing · Electrical Eng. & Systems 2024-10-23 Guanrou Yang , Fan Yu , Ziyang Ma , Zhihao Du , Zhifu Gao , Shiliang Zhang , Xie Chen

Automatic speech recognition (ASR) has been widely researched with supervised approaches, while many low-resourced languages lack audio-text aligned data, and supervised methods cannot be applied on them. In this work, we propose a…

Computation and Language · Computer Science 2018-08-14 Yi-Chen Chen , Chia-Hao Shen , Sung-Feng Huang , Hung-yi Lee

While neural text-to-speech systems perform remarkably well in high-resource scenarios, they cannot be applied to the majority of the over 6,000 spoken languages in the world due to a lack of appropriate training data. In this work, we use…

Computation and Language · Computer Science 2022-03-08 Florian Lux , Ngoc Thang Vu

Today, many state-of-the-art automatic speech recognition (ASR) systems apply all-neural models that map audio to word sequences trained end-to-end along one global optimisation criterion in a fully data driven fashion. These models allow…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-11 Xianrui Zheng , Yulan Liu , Deniz Gunceler , Daniel Willett

This paper presents a challenge to the community: given a large corpus of written text aligned to its normalized spoken form, train an RNN to learn the correct normalization function. We present a data set of general text where the…

Computation and Language · Computer Science 2017-01-26 Richard Sproat , Navdeep Jaitly
‹ Prev 1 2 3 10 Next ›