English
Related papers

Related papers: SpecAugment: A Simple Data Augmentation Method for…

200 papers

Recently, SpecAugment, an augmentation scheme for automatic speech recognition that acts directly on the spectrogram of input utterances, has shown to be highly effective in enhancing the performance of end-to-end networks on public…

Audio and Speech Processing · Electrical Eng. & Systems 2019-12-12 Daniel S. Park , Yu Zhang , Chung-Cheng Chiu , Youzheng Chen , Bo Li , William Chan , Quoc V. Le , Yonghui Wu

SpecAugment is a very effective data augmentation method for both HMM and E2E-based automatic speech recognition (ASR) systems. Especially, it also works in low-resource scenarios. However, SpecAugment masks the spectrum of time or the…

Sound · Computer Science 2022-10-18 Rui Li , Guodong Ma , Dexin Zhao , Ranran Zeng , Xiaoyu Li , Hao Huang

This work investigates a simple data augmentation technique, SpecAugment, for end-to-end speech translation. SpecAugment is a low-cost implementation method applied directly to the audio input features and it consists of masking blocks of…

Computation and Language · Computer Science 2019-11-21 Parnia Bahar , Albert Zeyer , Ralf Schlüter , Hermann Ney

In this paper, we propose MixSpeech, a simple yet effective data augmentation method based on mixup for automatic speech recognition (ASR). MixSpeech trains an ASR model by taking a weighted combination of two different speech features…

Computation and Language · Computer Science 2021-02-26 Linghui Meng , Jin Xu , Xu Tan , Jindong Wang , Tao Qin , Bo Xu

End-to-end models have achieved significant improvement on automatic speech recognition. One common method to improve performance of these models is expanding the data-space through data augmentation. Meanwhile, human auditory inspired…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-12 Zehai Tu , Jack Deadman , Ning Ma , Jon Barker

Varying data augmentation policies and regularization over the course of optimization has led to performance improvements over using fixed values. We show that population based training is a useful tool to continuously search those…

Computation and Language · Computer Science 2020-10-09 Daniel Haziza , Jérémy Rapin , Gabriel Synnaeve

In this paper, we present SpecAugment++, a novel data augmentation method for deep neural networks based acoustic scene classification (ASC). Different from other popular data augmentation methods such as SpecAugment and mixup that only…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-16 Helin Wang , Yuexian Zou , Wenwu Wang

Inspired by SpecAugment -- a data augmentation method for end-to-end ASR systems, we propose a frame-level SpecAugment method (f-SpecAugment) to improve the performance of deep convolutional neural networks (CNN) for hybrid HMM based ASR…

Computation and Language · Computer Science 2020-12-09 Xinwei Li , Yuanyuan Zhang , Xiaodan Zhuang , Daben Liu

A mixed sample data augmentation strategy is proposed to enhance the performance of models on audio scene classification, sound event classification, and speech enhancement tasks. While there have been several augmentation methods shown to…

Sound · Computer Science 2021-08-09 Gwantae Kim , David K. Han , Hanseok Ko

In this paper, we perform an in-depth study of how data augmentation techniques improve synthetic or spoofed audio detection. Specifically, we propose methods to deal with channel variability, different audio compressions, different…

Sound · Computer Science 2021-10-22 Ariel Cohen , Inbal Rimon , Eran Aflalo , Haim Permuter

Data augmentation is a technique to generate new training data based on existing data. We evaluate the simple and cost-effective method of concatenating the original data examples to build new training instances. Continued training with…

Computation and Language · Computer Science 2023-06-12 Tsz Kin Lam , Shigehiko Schamoni , Stefan Riezler

Data augmentation is a widely adopted technique utilized to improve the robustness of automatic speech recognition (ASR). Employing a fixed data augmentation strategy for all training data is a common practice. However, it is important to…

Sound · Computer Science 2024-12-03 Hongxuan Lu , Biao Li

We employ a combination of recent developments in semi-supervised learning for automatic speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled audio of the Libri-Light dataset. More precisely, we carry…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-22 Yu Zhang , James Qin , Daniel S. Park , Wei Han , Chung-Cheng Chiu , Ruoming Pang , Quoc V. Le , Yonghui Wu

Recently, a semi-supervised learning method known as "noisy student training" has been shown to improve image classification performance of deep networks significantly. Noisy student training is an iterative self-training method that…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-02 Daniel S. Park , Yu Zhang , Ye Jia , Wei Han , Chung-Cheng Chiu , Bo Li , Yonghui Wu , Quoc V. Le

Training a code-switching end-to-end automatic speech recognition (ASR) model normally requires a large amount of data, while code-switching data is often limited. In this paper, three novel approaches are proposed for code-switching data…

Computation and Language · Computer Science 2024-11-05 Chenpeng Du , Hao Li , Yizhou Lu , Lan Wang , Yanmin Qian

Data augmentations are known to improve robustness in speech-processing tasks. In this study, we summarize and compare different data augmentation strategies using S3PRL toolkit. We explore how HuBERT and wav2vec perform using different…

Sound · Computer Science 2024-04-01 Mina Huh , Ruchira Ray , Corey Karnei

Most of the current speech data augmentation methods operate on either the raw waveform or the amplitude spectrum of speech. In this paper, we propose a novel speech data augmentation method called PhasePerturbation that operates…

Sound · Computer Science 2023-12-15 Chengxi Lei , Satwinder Singh , Feng Hou , Xiaoyun Jia , Ruili Wang

We propose autoencoding speaker conversion for training data augmentation in automatic speech translation. This technique directly transforms an audio sequence, resulting in audio synthesized to resemble another speaker's voice. Our method…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-28 Arya D. McCarthy , Liezl Puzon , Juan Pino

Recent advancements in AI have democratized its deployment as a healthcare assistant. While pretrained models from large-scale visual and audio datasets have demonstrably generalized to this task, surprisingly, no studies have explored…

Sound · Computer Science 2024-05-07 June-Woo Kim , Miika Toikkanen , Sangmin Bae , Minseok Kim , Ho-Young Jung

Acoustic environments affect acoustic characteristics of sound to be recognized by physically interacting with sound wave propagation. Thus, training acoustic models for audio and speech tasks requires regularization on various acoustic…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-08 Hyeonuk Nam , Seong-Hu Kim , Yong-Hwa Park
‹ Prev 1 2 3 10 Next ›