Related papers: Self-supervised Learning with Speech Modulation Dr…

Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention

This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features; we extract a speaker representation used for adaptation directly from the test utterance. Conventional studies of deep neural…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-17 Yuma Koizumi , Kohei Yatabe , Marc Delcroix , Yoshiki Masuyama , Daiki Takeuchi

Self-supervised Learning for Speech Enhancement

Supervised learning for single-channel speech enhancement requires carefully labeled training examples where the noisy mixture is input into the network and the network is trained to produce an output close to the ideal target. To relax the…

Audio and Speech Processing · Electrical Eng. & Systems 2020-06-19 Yu-Che Wang , Shrikant Venkataramani , Paris Smaragdis

State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention With Dilated 1D Convolutions

Self-attention has been a huge success for many downstream tasks in NLP, which led to exploration of applying self-attention to speech problems as well. The efficacy of self-attention in speech applications, however, seems not fully blown…

Computation and Language · Computer Science 2019-10-03 Kyu J. Han , Ramon Prieto , Kaixing Wu , Tao Ma

Interpreting Pretrained Speech Models for Automatic Speech Assessment of Voice Disorders

Speech contains information that is clinically relevant to some diseases, which has the potential to be used for health assessment. Recent work shows an interest in applying deep learning algorithms, especially pretrained large speech…

Sound · Computer Science 2024-07-02 Hok-Shing Lau , Mark Huntly , Nathon Morgan , Adesua Iyenoma , Biao Zeng , Tim Bashford

Self Multi-Head Attention for Speaker Recognition

Most state-of-the-art Deep Learning (DL) approaches for speaker recognition work on a short utterance level. Given the speech signal, these algorithms extract a sequence of speaker embeddings from short segments and those are averaged to…

Sound · Computer Science 2019-07-03 Miquel India , Pooyan Safari , Javier Hernando

Self-supervised Speaker Recognition Training Using Human-Machine Dialogues

Speaker recognition, recognizing speaker identities based on voice alone, enables important downstream applications, such as personalization and authentication. Learning speaker representations, in the context of supervised learning,…

Machine Learning · Computer Science 2022-07-13 Metehan Cekic , Ruirui Li , Zeya Chen , Yuguang Yang , Andreas Stolcke , Upamanyu Madhow

Self-Adaptive Training: Bridging Supervised and Self-Supervised Learning

We propose self-adaptive training -- a unified training algorithm that dynamically calibrates and enhances training processes by model predictions without incurring an extra computational cost -- to advance both supervised and…

Machine Learning · Computer Science 2022-10-17 Lang Huang , Chao Zhang , Hongyang Zhang

Progressive Multi-Scale Self-Supervised Learning for Speech Recognition

Self-supervised learning (SSL) models have achieved considerable improvements in automatic speech recognition (ASR). In addition, ASR performance could be further improved if the model is dedicated to audio content information learning…

Audio and Speech Processing · Electrical Eng. & Systems 2022-12-08 Genshun Wan , Tan Liu , Hang Chen , Jia Pan , Cong Liu , Zhongfu Ye

Data-selective Transfer Learning for Multi-Domain Speech Recognition

Negative transfer in training of acoustic models for automatic speech recognition has been reported in several contexts such as domain change or speaker characteristics. This paper proposes a novel technique to overcome negative transfer by…

Machine Learning · Computer Science 2015-09-18 Mortaza Doulaty , Oscar Saz , Thomas Hain

Heterogeneous Self-Supervised Acoustic Pre-Training with Local Constraints

Self-supervised pre-training using unlabeled data is widely used in automatic speech recognition. In this paper, we propose a new self-supervised pre-training approach to dealing with heterogeneous data. Instead of mixing all the data and…

Machine Learning · Computer Science 2025-09-10 Xiaodong Cui , A F M Saif , Brian Kingsbury , Tianyi Chen

Self-Teaching Networks

We propose self-teaching networks to improve the generalization capacity of deep neural networks. The idea is to generate soft supervision labels using the output layer for training the lower layers of the network. During the network…

Audio and Speech Processing · Electrical Eng. & Systems 2019-09-11 Liang Lu , Eric Sun , Yifan Gong

Semi-Supervised Speech Recognition via Local Prior Matching

For sequence transduction tasks like speech recognition, a strong structured prior model encodes rich information about the target space, implicitly ruling out invalid sequences by assigning them low probability. In this work, we propose…

Computation and Language · Computer Science 2020-02-25 Wei-Ning Hsu , Ann Lee , Gabriel Synnaeve , Awni Hannun

Self-Supervised Learning for speech recognition with Intermediate layer supervision

Recently, pioneer work finds that speech pre-trained models can solve full-stack speech processing tasks, because the model utilizes bottom layers to learn speaker-related information and top layers to encode content-related information.…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-17 Chengyi Wang , Yu Wu , Sanyuan Chen , Shujie Liu , Jinyu Li , Yao Qian , Zhenglu Yang

Dynamic Time-Aware Attention to Speaker Roles and Contexts for Spoken Language Understanding

Spoken language understanding (SLU) is an essential component in conversational systems. Most SLU component treats each utterance independently, and then the following components aggregate the multi-turn information in the separate phases.…

Computation and Language · Computer Science 2017-12-12 Po-Chun Chen , Ta-Chung Chi , Shang-Yu Su , Yun-Nung Chen

The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning

The past few years have seen remarkable progress in the decoding of speech from brain activity, primarily driven by large single-subject datasets. However, due to individual variation, such as anatomy, and differences in task design and…

Machine Learning · Computer Science 2025-06-03 Dulhan Jayalath , Gilad Landau , Brendan Shillingford , Mark Woolrich , Oiwi Parker Jones

Speech Representation Learning Revisited: The Necessity of Separate Learnable Parameters and Robust Data Augmentation

Speech modeling methods learn one embedding for a fixed segment of speech, typically in between 10-25 ms. The information present in speech can be divided into two categories: "what is being said" (content) and "how it is expressed" (other)…

Computation and Language · Computer Science 2025-03-04 Hemant Yadav , Sunayana Sitaram , Rajiv Ratn Shah

Self-Supervised Disentangled Representation Learning for Robust Target Speech Extraction

Speech signals are inherently complex as they encompass both global acoustic characteristics and local semantic information. However, in the task of target speech extraction, certain elements of global and local semantic information in the…

Sound · Computer Science 2024-08-27 Zhaoxi Mu , Xinyu Yang , Sining Sun , Qing Yang

Unsupervised Domain Discovery using Latent Dirichlet Allocation for Acoustic Modelling in Speech Recognition

Speech recognition systems are often highly domain dependent, a fact widely reported in the literature. However the concept of domain is complex and not bound to clear criteria. Hence it is often not evident if data should be considered to…

Computation and Language · Computer Science 2015-09-23 Mortaza Doulaty , Oscar Saz , Thomas Hain

Dropout Regularization for Self-Supervised Learning of Transformer Encoder Speech Representation

Predicting the altered acoustic frames is an effective way of self-supervised learning for speech representation. However, it is challenging to prevent the pretrained model from overfitting. In this paper, we proposed to introduce two…

Audio and Speech Processing · Electrical Eng. & Systems 2021-07-12 Jian Luo , Jianzong Wang , Ning Cheng , Jing Xiao

Incremental Layer-wise Self-Supervised Learning for Efficient Speech Domain Adaptation On Device

Streaming end-to-end speech recognition models have been widely applied to mobile devices and show significant improvement in efficiency. These models are typically trained on the server using transcribed speech data. However, the server…

Sound · Computer Science 2021-10-04 Zhouyuan Huo , Dongseong Hwang , Khe Chai Sim , Shefali Garg , Ananya Misra , Nikhil Siddhartha , Trevor Strohman , Françoise Beaufays