Related papers: Data-selective Transfer Learning for Multi-Domain …

A Transfer Learning Method for Speech Emotion Recognition from Automatic Speech Recognition

This paper presents a transfer learning method in speech emotion recognition based on a Time-Delay Neural Network (TDNN) architecture. A major challenge in the current speech-based emotion detection research is data scarcity. The proposed…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-18 Sitong Zhou , Homayoon Beigi

Multilingual transfer of acoustic word embeddings improves when training on languages related to the target zero-resource language

Acoustic word embedding models map variable duration speech segments to fixed dimensional vectors, enabling efficient speech search and discovery. Previous work explored how embeddings can be obtained in zero-resource settings where no…

Computation and Language · Computer Science 2021-06-25 Christiaan Jacobs , Herman Kamper

Transfer Learning for Speech and Language Processing

Transfer learning is a vital technique that generalizes models trained for one setting or task to other settings or tasks. For example in speech recognition, an acoustic model trained for one language can be used to recognize speech in…

Computation and Language · Computer Science 2015-11-20 Dong Wang , Thomas Fang Zheng

Latent Dirichlet Allocation Based Acoustic Data Selection for Automatic Speech Recognition

Selecting in-domain data from a large pool of diverse and out-of-domain data is a non-trivial problem. In most cases simply using all of the available data will lead to sub-optimal and in some cases even worse performance compared to…

Computation and Language · Computer Science 2019-07-03 Mortaza , Doulaty , Thomas Hain

Self-Train Before You Transcribe

When there is a mismatch between the training and test domains, current speech recognition systems show significant performance degradation. Self-training methods, such as noisy student teacher training, can help address this and enable the…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-21 Robert Flynn , Anton Ragni

Unsupervised Data Selection via Discrete Speech Representation for ASR

Self-supervised learning of speech representations has achieved impressive results in improving automatic speech recognition (ASR). In this paper, we show that data selection is important for self-supervised learning. We propose a simple…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-06 Zhiyun Lu , Yongqiang Wang , Yu Zhang , Wei Han , Zhehuai Chen , Parisa Haghani

How Class Ontology and Data Scale Affect Audio Transfer Learning

Transfer learning is a crucial concept within deep learning that allows artificial neural networks to benefit from a large pre-training data basis when confronted with a task of limited data. Despite its ubiquitous use and clear benefits,…

Machine Learning · Computer Science 2026-05-20 Manuel Milling , Andreas Triantafyllopoulos , Alexander Gebhard , Simon Rampp , Björn W. Schuller

Meta-Transfer Learning for Code-Switched Speech Recognition

An increasing number of people in the world today speak a mixed-language as a result of being multilingual. However, building a speech recognition system for code-switching remains difficult due to the availability of limited resources and…

Computation and Language · Computer Science 2020-04-30 Genta Indra Winata , Samuel Cahyawijaya , Zhaojiang Lin , Zihan Liu , Peng Xu , Pascale Fung

Domain Adaptive Transfer Learning with Specialist Models

Transfer learning is a widely used method to build high performing computer vision models. In this paper, we study the efficacy of transfer learning by examining how the choice of data impacts performance. We find that more pre-training…

Computer Vision and Pattern Recognition · Computer Science 2018-12-13 Jiquan Ngiam , Daiyi Peng , Vijay Vasudevan , Simon Kornblith , Quoc V. Le , Ruoming Pang

Domain Adaptation Using Class Similarity for Robust Speech Recognition

When only limited target domain data is available, domain adaptation could be used to promote performance of deep neural network (DNN) acoustic model by leveraging well-trained source model and target domain data. However, suffering from…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-06 Han Zhu , Jiangjiang Zhao , Yuling Ren , Li Wang , Pengyuan Zhang

Large-Scale Domain Adaptation via Teacher-Student Learning

High accuracy speech recognition requires a large amount of transcribed data for supervised training. In the absence of such data, domain adaptation of a well-trained acoustic model can be performed, but even here, high accuracy usually…

Computation and Language · Computer Science 2017-08-21 Jinyu Li , Michael L. Seltzer , Xi Wang , Rui Zhao , Yifan Gong

An Exploration of Data Efficiency in Intra-Dataset Task Transfer for Dialog Understanding

Transfer learning is an exciting area of Natural Language Processing that has the potential to both improve model performance and increase data efficiency. This study explores the effects of varying quantities of target task training data…

Computation and Language · Computer Science 2022-10-24 Josiah Ross , Luke Yoffe , Alon Albalak , William Yang Wang

Classification Algorithm of Speech Data of Parkinsons Disease Based on Convolution Sparse Kernel Transfer Learning with Optimal Kernel and Parallel Sample Feature Selection

Labeled speech data from patients with Parkinsons disease (PD) are scarce, and the statistical distributions of training and test data differ significantly in the existing datasets. To solve these problems, dimensional reduction and sample…

Machine Learning · Computer Science 2020-02-11 Xiaoheng Zhang , Yongming Li , Pin Wang , Xiaoheng Tan , Yuchuan Liu

Learning Transferable Features for Speech Emotion Recognition

Emotion recognition from speech is one of the key steps towards emotional intelligence in advanced human-machine interaction. Identifying emotions in human speech requires learning features that are robust and discriminative across diverse…

Audio and Speech Processing · Electrical Eng. & Systems 2019-12-30 Alison Marczewski , Adriano Veloso , Nívio Ziviani

Characterizing and Avoiding Negative Transfer

When labeled data is scarce for a specific target task, transfer learning often offers an effective solution by utilizing data from a related source task. However, when transferring knowledge from a less related source, it may inversely…

Machine Learning · Computer Science 2019-10-08 Zirui Wang , Zihang Dai , Barnabás Póczos , Jaime Carbonell

Anti-Transfer Learning for Task Invariance in Convolutional Neural Networks for Speech Processing

We introduce the novel concept of anti-transfer learning for speech processing with convolutional neural networks. While transfer learning assumes that the learning process for a target task will benefit from re-using representations…

Machine Learning · Computer Science 2021-01-14 Eric Guizzo , Tillman Weyde , Giacomo Tarroni

Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation

Automatic speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction. One of the main challenges in SER is data scarcity, i.e., insufficient amounts of carefully labeled data to…

Sound · Computer Science 2021-08-17 Sarala Padi , Seyed Omid Sadjadi , Dinesh Manocha , Ram D. Sriram

Best of Both Worlds: Robust Accented Speech Recognition with Adversarial Transfer Learning

Training deep neural networks for automatic speech recognition (ASR) requires large amounts of transcribed speech. This becomes a bottleneck for training robust models for accented speech which typically contains high variability in…

Audio and Speech Processing · Electrical Eng. & Systems 2021-03-11 Nilaksh Das , Sravan Bodapati , Monica Sunkara , Sundararajan Srinivasan , Duen Horng Chau

Unsupervised Domain Discovery using Latent Dirichlet Allocation for Acoustic Modelling in Speech Recognition

Speech recognition systems are often highly domain dependent, a fact widely reported in the literature. However the concept of domain is complex and not bound to clear criteria. Hence it is often not evident if data should be considered to…

Computation and Language · Computer Science 2015-09-23 Mortaza Doulaty , Oscar Saz , Thomas Hain

Adapting TTS models For New Speakers using Transfer Learning

Training neural text-to-speech (TTS) models for a new speaker typically requires several hours of high quality speech data. Prior works on voice cloning attempt to address this challenge by adapting pre-trained multi-speaker TTS models for…

Sound · Computer Science 2022-04-07 Paarth Neekhara , Jason Li , Boris Ginsburg