Related papers: Towards Automatic Data Augmentation for Disordered…

Adversarial Data Augmentation for Disordered Speech Recognition

Automatic recognition of disordered speech remains a highly challenging task to date. The underlying neuro-motor conditions, often compounded with co-occurring physical disabilities, lead to the difficulty in collecting large quantities of…

Audio and Speech Processing · Electrical Eng. & Systems 2021-08-03 Zengrui Jin , Mengzhe Geng , Xurong Xie , Jianwei Yu , Shansong Liu , Xunying Liu , Helen Meng

Investigation of Data Augmentation Techniques for Disordered Speech Recognition

Disordered speech recognition is a highly challenging task. The underlying neuro-motor conditions of people with speech disorders, often compounded with co-occurring physical disabilities, lead to the difficulty in collecting large…

Sound · Computer Science 2022-01-20 Mengzhe Geng , Xurong Xie , Shansong Liu , Jianwei Yu , Shoukang Hu , Xunying Liu , Helen Meng

Enhancing Pre-trained ASR System Fine-tuning for Dysarthric Speech Recognition using Adversarial Data Augmentation

Automatic recognition of dysarthric speech remains a highly challenging task to date. Neuro-motor conditions and co-occurring physical disabilities create difficulty in large-scale data collection for ASR system development. Adapting SSL…

Sound · Computer Science 2024-01-02 Huimeng Wang , Zengrui Jin , Mengzhe Geng , Shujie Hu , Guinan Li , Tianzi Wang , Haoning Xu , Xunying Liu

Training Data Augmentation for Dysarthric Automatic Speech Recognition by Text-to-Dysarthric-Speech Synthesis

Automatic speech recognition (ASR) research has achieved impressive performance in recent years and has significant potential for enabling access for people with dysarthria (PwD) in augmentative and alternative communication (AAC) and home…

Sound · Computer Science 2024-06-14 Wing-Zin Leung , Mattias Cross , Anton Ragni , Stefan Goetze

On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR

We propose an on-the-fly data augmentation method for automatic speech recognition (ASR) that uses alignment information to generate effective training samples. Our method, called Aligned Data Augmentation (ADA) for ASR, replaces…

Computation and Language · Computer Science 2023-06-13 Tsz Kin Lam , Mayumi Ohta , Shigehiko Schamoni , Stefan Riezler

Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition

Despite the rapid progress of automatic speech recognition (ASR) technologies targeting normal speech, accurate recognition of dysarthric and elderly speech remains highly challenging tasks to date. It is difficult to collect large…

Audio and Speech Processing · Electrical Eng. & Systems 2025-11-05 Zengrui Jin , Mengzhe Geng , Jiajun Deng , Tianzi Wang , Shujie Hu , Guinan Li , Xunying Liu

ROAR: Reinforcing Original to Augmented Data Ratio Dynamics for Wav2Vec2.0 Based ASR

While automatic speech recognition (ASR) greatly benefits from data augmentation, the augmentation recipes themselves tend to be heuristic. In this paper, we address one of the heuristic approach associated with balancing the right amount…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-17 Vishwanath Pratap Singh , Federico Malato , Ville Hautamaki , Md. Sahidullah , Tomi Kinnunen

Dual Application of Speech Enhancement for Automatic Speech Recognition

In this work, we exploit speech enhancement for improving a recurrent neural network transducer (RNN-T) based ASR system. We employ a dense convolutional recurrent network (DCRN) for complex spectral mapping based speech enhancement, and…

Sound · Computer Science 2020-11-10 Ashutosh Pandey , Chunxi Liu , Yun Wang , Yatharth Saraf

Inclusive ASR for Disfluent Speech: Cascaded Large-Scale Self-Supervised Learning with Targeted Fine-Tuning and Data Augmentation

Automatic speech recognition (ASR) systems often falter while processing stuttering-related disfluencies -- such as involuntary blocks and word repetitions -- yielding inaccurate transcripts. A critical barrier to progress is the scarcity…

Audio and Speech Processing · Electrical Eng. & Systems 2024-10-03 Dena Mujtaba , Nihar R. Mahapatra , Megan Arney , J. Scott Yaruss , Caryn Herring , Jia Bin

Data Augmentation with Locally-time Reversed Speech for Automatic Speech Recognition

Psychoacoustic studies have shown that locally-time reversed (LTR) speech, i.e., signal samples time-reversed within a short segment, can be accurately recognised by human listeners. This study addresses the question of how well a…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-12 Si-Ioi Ng , Tan Lee

Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition

Automatic recognition of disordered speech remains a highly challenging task to date. The underlying neuro-motor conditions, often compounded with co-occurring physical disabilities, lead to the difficulty in collecting large quantities of…

Audio and Speech Processing · Electrical Eng. & Systems 2023-03-21 Zengrui Jin , Xurong Xie , Mengzhe Geng , Tianzi Wang , Shujie Hu , Jiajun Deng , Guinan Li , Xunying Liu

Text-To-Speech Data Augmentation for Low Resource Speech Recognition

Nowadays, the main problem of deep learning techniques used in the development of automatic speech recognition (ASR) models is the lack of transcribed data. The goal of this research is to propose a new data augmentation method to improve…

Computation and Language · Computer Science 2022-04-04 Rodolfo Zevallos

Automatic Data Augmentation for Domain Adapted Fine-Tuning of Self-Supervised Speech Representations

Self-Supervised Learning (SSL) has allowed leveraging large amounts of unlabeled speech data to improve the performance of speech recognition models even with small annotated datasets. Despite this, speech SSL representations may fail while…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-02 Salah Zaiem , Titouan Parcollet , Slim Essid

Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems

Recent advances in text-to-speech (TTS) led to the development of flexible multi-speaker end-to-end TTS systems. We extend state-of-the-art attention-based automatic speech recognition (ASR) systems with synthetic audio generated by a TTS…

Computation and Language · Computer Science 2020-02-18 Nick Rossenbach , Albert Zeyer , Ralf Schlüter , Hermann Ney

Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition

Automatic recognition of disordered and elderly speech remains a highly challenging task to date due to the difficulty in collecting such data in large quantities. This paper explores a series of approaches to integrate domain adapted SSL…

Sound · Computer Science 2023-06-23 Shujie Hu , Xurong Xie , Zengrui Jin , Mengzhe Geng , Yi Wang , Mingyu Cui , Jiajun Deng , Xunying Liu , Helen Meng

Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation

Sequence-to-Sequence (S2S) models recently started to show state-of-the-art performance for automatic speech recognition (ASR). With these large and deep models overfitting remains the largest problem, outweighing performance improvements…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-04 Thai-Son Nguyen , Sebastian Stueker , Jan Niehues , Alex Waibel

Recent Progress in the CUHK Dysarthric Speech Recognition System

Despite the rapid progress of automatic speech recognition (ASR) technologies in the past few decades, recognition of disordered speech remains a highly challenging task to date. Disordered speech presents a wide spectrum of challenges to…

Audio and Speech Processing · Electrical Eng. & Systems 2022-03-01 Shansong Liu , Mengzhe Geng , Shoukang Hu , Xurong Xie , Mingyu Cui , Jianwei Yu , Xunying Liu , Helen Meng

Improving Accented Speech Recognition using Data Augmentation based on Unsupervised Text-to-Speech Synthesis

This paper investigates the use of unsupervised text-to-speech synthesis (TTS) as a data augmentation method to improve accented speech recognition. TTS systems are trained with a small amount of accented speech training data and their…

Computation and Language · Computer Science 2024-07-08 Cong-Thanh Do , Shuhei Imai , Rama Doddipatla , Thomas Hain

Data Augmentation for Training Dialog Models Robust to Speech Recognition Errors

Speech-based virtual assistants, such as Amazon Alexa, Google assistant, and Apple Siri, typically convert users' audio signals to text data through automatic speech recognition (ASR) and feed the text to downstream dialog models for…

Computation and Language · Computer Science 2020-06-11 Longshaokan Wang , Maryam Fazel-Zarandi , Aditya Tiwari , Spyros Matsoukas , Lazaros Polymenakos

Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation

The performance of automatic speech recognition (ASR) systems has advanced substantially in recent years, particularly for languages for which a large amount of transcribed speech is available. Unfortunately, for low-resource languages,…

Computation and Language · Computer Science 2023-05-22 Martijn Bartelds , Nay San , Bradley McDonnell , Dan Jurafsky , Martijn Wieling