English
Related papers

Related papers: A Unified Transformer-based Framework for Duplex T…

200 papers

Inverse text normalization (ITN) is crucial for converting spoken-form into written-form, especially in the context of automatic speech recognition (ASR). While most downstream tasks of ASR rely on written-form, ASR systems often output…

Computation and Language · Computer Science 2023-09-19 Juntae Kim , Minkyu Lim , Seokjin Hong

Inverse text normalization (ITN) is used to convert the spoken form output of an automatic speech recognition (ASR) system to a written form. Traditional handcrafted ITN rules can be complex to transcribe and maintain. Meanwhile neural…

Computation and Language · Computer Science 2022-07-21 Laxmi Pandey , Debjyoti Paul , Pooja Chitkara , Yutong Pang , Xuedong Zhang , Kjell Schubert , Mark Chou , Shu Liu , Yatharth Saraf

Inverse text normalization (ITN) is an essential post-processing step in automatic speech recognition (ASR). It converts numbers, dates, abbreviations, and other semiotic classes from the spoken form generated by ASR to their written forms.…

Computation and Language · Computer Science 2022-08-02 Alexandra Antonova , Evelina Bakhturina , Boris Ginsburg

While there have been several contributions exploring state of the art techniques for text normalization, the problem of inverse text normalization (ITN) remains relatively unexplored. The best known approaches leverage finite state…

Computation and Language · Computer Science 2021-02-15 Monica Sunkara , Chaitanya Shivade , Sravan Bodapati , Katrin Kirchhoff

Features such as punctuation, capitalization, and formatting of entities are important for readability, understanding, and natural language processing tasks. However, Automatic Speech Recognition (ASR) systems produce spoken-form text…

Computation and Language · Computer Science 2022-10-28 Sharman Tan , Piyush Behre , Nick Kibre , Issac Alphonso , Shuangyu Chang

Transformer-based text to speech (TTS) model (e.g., Transformer TTS~\cite{li2019neural}, FastSpeech~\cite{ren2019fastspeech}) has shown the advantages of training and inference efficiency over RNN-based model (e.g.,…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-04 Mingjian Chen , Xu Tan , Yi Ren , Jin Xu , Hao Sun , Sheng Zhao , Tao Qin , Tie-Yan Liu

Text Normalization (TN) is a key preprocessing step in Text-to-Speech (TTS) systems, converting written forms into their canonical spoken equivalents. Traditional TN systems can exhibit high accuracy, but involve substantial engineering…

Computation and Language · Computer Science 2025-11-06 Michel Wong , Ali Alshehri , Sophia Kao , Haotian He

With the emergence of automatic speech recognition (ASR) models, converting the spoken form text (from ASR) to the written form is in urgent need. This inverse text normalization (ITN) problem attracts the attention of researchers from…

Computation and Language · Computer Science 2023-01-25 Szu-Jui Chen , Debjyoti Paul , Yutong Pang , Peng Su , Xuedong Zhang

Developing Text Normalization (TN) systems for Text-to-Speech (TTS) on new languages is hard. We propose a novel architecture to facilitate it for multiple languages while using data less than 3% of the size of the data used by the state of…

Computation and Language · Computer Science 2021-04-19 Shubhi Tyagi , Antonio Bonafonte , Jaime Lorenzo-Trueba , Javier Latorre

Automatic Speech Recognition (ASR) systems typically yield output in lexical form. However, humans prefer a written form output. To bridge this gap, ASR systems usually employ Inverse Text Normalization (ITN). In previous works, Weighted…

Computation and Language · Computer Science 2022-11-08 Yashesh Gaur , Nick Kibre , Jian Xue , Kangyuan Shu , Yuhui Wang , Issac Alphanso , Jinyu Li , Yifan Gong

Inverse Text Normalization (ITN) is crucial for converting spoken Automatic Speech Recognition (ASR) outputs into well-formatted written text, enhancing both readability and usability. Despite its importance, the integration of streaming…

Computation and Language · Computer Science 2025-06-02 Luong Ho , Khanh Le , Vinh Pham , Bao Nguyen , Tan Tran , Duc Chau

This paper presents a challenge to the community: given a large corpus of written text aligned to its normalized spoken form, train an RNN to learn the correct normalization function. We present a data set of general text where the…

Computation and Language · Computer Science 2017-01-26 Richard Sproat , Navdeep Jaitly

Text normalization, or the process of transforming text into a consistent, canonical form, is crucial for speech applications such as text-to-speech synthesis (TTS). In TTS, the system must decide whether to verbalize "1995" as "nineteen…

Machine Learning · Computer Science 2022-02-02 Jae Hun Ro , Felix Stahlberg , Ke Wu , Shankar Kumar

We define multilevel text normalization as sequence-to-sequence processing that transforms naturally noisy text into a sequence of normalized units of meaning (morphemes) in three steps: 1) writing normalization, 2) lemmatization, 3)…

Computation and Language · Computer Science 2019-04-01 Tatyana Ruzsics , Tanja Samardžić

Inverse text normalization (ITN) converts spoken-domain automatic speech recognition (ASR) output into written-domain text to improve the readability of the ASR output. Many state-of-the-art ITN systems use hand-written weighted…

Computation and Language · Computer Science 2021-05-18 Yang Zhang , Evelina Bakhturina , Kyle Gorman , Boris Ginsburg

This paper presents an simple yet sophisticated approach to the challenge by Sproat and Jaitly (2016)- given a large corpus of written text aligned to its normalized spoken form, train an RNN to learn the correct normalization function.…

Computation and Language · Computer Science 2017-12-20 Maryam Zare , Shaurya Rohatgi

An unsupervised text-to-speech synthesis (TTS) system learns to generate speech waveforms corresponding to any written sentence in a language by observing: 1) a collection of untranscribed speech waveforms in that language; 2) a collection…

Audio and Speech Processing · Electrical Eng. & Systems 2022-08-17 Junrui Ni , Liming Wang , Heting Gao , Kaizhi Qian , Yang Zhang , Shiyu Chang , Mark Hasegawa-Johnson

Text-to-speech synthesis (TTS) has witnessed rapid progress in recent years, where neural methods became capable of producing audios with high naturalness. However, these efforts still suffer from two types of latencies: (a) the {\em…

Computation and Language · Computer Science 2020-10-08 Mingbo Ma , Baigong Zheng , Kaibo Liu , Renjie Zheng , Hairong Liu , Kainan Peng , Kenneth Church , Liang Huang

The capability to jointly process multi-modal information is becoming an essential task. However, the limited number of paired multi-modal data and the large computational requirements in multi-modal learning hinder the development. We…

Computation and Language · Computer Science 2025-06-09 Minsu Kim , Jee-weon Jung , Hyeongseop Rha , Soumi Maiti , Siddhant Arora , Xuankai Chang , Shinji Watanabe , Yong Man Ro

Text-to-speech (TTS) and voice conversion (VC) are two different tasks both aiming at generating high quality speaking voice according to different input modality. Due to their similarity, this paper proposes UnifySpeech, which brings TTS…

Sound · Computer Science 2023-01-11 Haogeng Liu , Tao Wang , Ruibo Fu , Jiangyan Yi , Zhengqi Wen , Jianhua Tao
‹ Prev 1 2 3 10 Next ›