Related papers: Variational Auto-Encoder Based Variability Encodin…

Variational Autoencoder for Speech Enhancement with a Noise-Aware Encoder

Recently, a generative variational autoencoder (VAE) has been proposed for speech enhancement to model speech statistics. However, this approach only uses clean speech in the training phase, making the estimation particularly sensitive to…

Audio and Speech Processing · Electrical Eng. & Systems 2021-05-18 Huajian Fang , Guillaume Carbajal , Stefan Wermter , Timo Gerkmann

Audio-visual Speech Enhancement Using Conditional Variational Auto-Encoders

Variational auto-encoders (VAEs) are deep generative latent variable models that can be used for learning the distribution of complex data. VAEs have been successfully used to learn a probabilistic prior over speech signals, which is then…

Sound · Computer Science 2020-12-18 Mostafa Sadeghi , Simon Leglaive , Xavier Alameda-PIneda , Laurent Girin , Radu Horaud

Unsupervised Speech Enhancement using Dynamical Variational Auto-Encoders

Dynamical variational autoencoders (DVAEs) are a class of deep generative models with latent variables, dedicated to model time series of high-dimensional data. DVAEs can be considered as extensions of the variational autoencoder (VAE) that…

Sound · Computer Science 2022-10-04 Xiaoyu Bie , Simon Leglaive , Xavier Alameda-Pineda , Laurent Girin

Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition

Automatic recognition of disordered speech remains a highly challenging task to date. The underlying neuro-motor conditions, often compounded with co-occurring physical disabilities, lead to the difficulty in collecting large quantities of…

Audio and Speech Processing · Electrical Eng. & Systems 2023-03-21 Zengrui Jin , Xurong Xie , Mengzhe Geng , Tianzi Wang , Shujie Hu , Jiajun Deng , Guinan Li , Xunying Liu

Conditional Deep Hierarchical Variational Autoencoder for Voice Conversion

Variational autoencoder-based voice conversion (VAE-VC) has the advantage of requiring only pairs of speeches and speaker labels for training. Unlike the majority of the research in VAE-VC which focuses on utilizing auxiliary losses or…

Sound · Computer Science 2021-12-07 Kei Akuzawa , Kotaro Onishi , Keisuke Takiguchi , Kohki Mametani , Koichiro Mori

Weak-Supervised Dysarthria-invariant Features for Spoken Language Understanding using an FHVAE and Adversarial Training

The scarcity of training data and the large speaker variation in dysarthric speech lead to poor accuracy and poor speaker generalization of spoken language understanding systems for dysarthric speech. Through work on the speech features, we…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-25 Jinzi Qi , Hugo Van hamme

DSVAE: Interpretable Disentangled Representation for Synthetic Speech Detection

Tools to generate high quality synthetic speech signal that is perceptually indistinguishable from speech recorded from human speakers are easily available. Several approaches have been proposed for detecting synthetic speech. Many of these…

Sound · Computer Science 2023-08-01 Amit Kumar Singh Yadav , Kratika Bhagtani , Ziyue Xiang , Paolo Bestagini , Stefano Tubaro , Edward J. Delp

A Benchmark of Dynamical Variational Autoencoders applied to Speech Spectrogram Modeling

The Variational Autoencoder (VAE) is a powerful deep generative model that is now extensively used to represent high-dimensional complex data via a low-dimensional latent space learned in an unsupervised manner. In the original VAE model,…

Sound · Computer Science 2021-06-15 Xiaoyu Bie , Laurent Girin , Simon Leglaive , Thomas Hueber , Xavier Alameda-Pineda

Constrained Variational Autoencoder for improving EEG based Speech Recognition Systems

In this paper we introduce a recurrent neural network (RNN) based variational autoencoder (VAE) model with a new constrained loss function that can generate more meaningful electroencephalography (EEG) features from raw EEG features to…

Audio and Speech Processing · Electrical Eng. & Systems 2020-06-05 Gautam Krishna , Co Tran , Mason Carnahan , Ahmed Tewfik

Switching Variational Auto-Encoders for Noise-Agnostic Audio-visual Speech Enhancement

Recently, audio-visual speech enhancement has been tackled in the unsupervised settings based on variational auto-encoders (VAEs), where during training only clean data is used to train a generative model for speech, which at test time is…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-09 Mostafa Sadeghi , Xavier Alameda-Pineda

Unsupervised speech representation learning using WaveNet autoencoders

We consider the task of unsupervised extraction of meaningful latent representations of speech by applying autoencoding neural networks to speech waveforms. The goal is to learn a representation able to capture high level semantic content…

Machine Learning · Computer Science 2019-09-12 Jan Chorowski , Ron J. Weiss , Samy Bengio , Aäron van den Oord

Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition

Automatic recognition of disordered speech remains a highly challenging task to date. Sources of variability commonly found in normal speech including accent, age or gender, when further compounded with the underlying causes of speech…

Sound · Computer Science 2022-01-20 Mengzhe Geng , Shansong Liu , Jianwei Yu , Xurong Xie , Shoukang Hu , Zi Ye , Zengrui Jin , Xunying Liu , Helen Meng

A Statistically Principled and Computationally Efficient Approach to Speech Enhancement using Variational Autoencoders

Recent studies have explored the use of deep generative models of speech spectra based of variational autoencoders (VAEs), combined with unsupervised noise models, to perform speech enhancement. These studies developed iterative algorithms…

Sound · Computer Science 2019-05-15 Manuel Pariente , Antoine Deleforge , Emmanuel Vincent

Improved disentangled speech representations using contrastive learning in factorized hierarchical variational autoencoder

Leveraging the fact that speaker identity and content vary on different time scales, \acrlong{fhvae} (\acrshort{fhvae}) uses different latent variables to symbolize these two attributes. Disentanglement of these attributes is carried out by…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-16 Yuying Xie , Thomas Arildsen , Zheng-Hua Tan

Learning Explicit Prosody Models and Deep Speaker Embeddings for Atypical Voice Conversion

Though significant progress has been made for the voice conversion (VC) of typical speech, VC for atypical speech, e.g., dysarthric and second-language (L2) speech, remains a challenge, since it involves correcting for atypical prosody…

Audio and Speech Processing · Electrical Eng. & Systems 2021-07-26 Disong Wang , Songxiang Liu , Lifa Sun , Xixin Wu , Xunying Liu , Helen Meng

Denoising Criterion for Variational Auto-Encoding Framework

Denoising autoencoders (DAE) are trained to reconstruct their clean inputs with noise injected at the input level, while variational autoencoders (VAE) are trained with noise injected in their stochastic hidden layer, with a regularizer…

Machine Learning · Computer Science 2016-01-05 Daniel Jiwoong Im , Sungjin Ahn , Roland Memisevic , Yoshua Bengio

Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition

Despite the rapid progress of automatic speech recognition (ASR) technologies targeting normal speech in recent decades, accurate recognition of dysarthric and elderly speech remains highly challenging tasks to date. Sources of…

Audio and Speech Processing · Electrical Eng. & Systems 2022-03-18 Mengzhe Geng , Xurong Xie , Zi Ye , Tianzi Wang , Guinan Li , Shujie Hu , Xunying Liu , Helen Meng

Variational decomposition autoencoding improves disentanglement of latent representations

Understanding the structure of complex, nonstationary, high-dimensional time-evolving signals is a central challenge in scientific data analysis. In many domains, such as speech and biomedical signal processing, the ability to learn…

Machine Learning · Computer Science 2026-01-13 Ioannis Ziogas , Aamna Al Shehhi , Ahsan H. Khandoker , Leontios J. Hadjileontiadis

VAE-based Domain Adaptation for Speaker Verification

Deep speaker embedding has achieved satisfactory performance in speaker verification. By enforcing the neural model to discriminate the speakers in the training set, deep speaker embedding (called `x-vectors`) can be derived from the hidden…

Audio and Speech Processing · Electrical Eng. & Systems 2019-08-28 Xueyi Wang , Lantian Li , Dong Wang

Unsupervised acoustic unit discovery for speech synthesis using discrete latent-variable neural networks

For our submission to the ZeroSpeech 2019 challenge, we apply discrete latent-variable neural networks to unlabelled speech and use the discovered units for speech synthesis. Unsupervised discrete subword modelling could be useful for…

Computation and Language · Computer Science 2019-07-01 Ryan Eloff , André Nortje , Benjamin van Niekerk , Avashna Govender , Leanne Nortje , Arnu Pretorius , Elan van Biljon , Ewald van der Westhuizen , Lisa van Staden , Herman Kamper