English
Related papers

Related papers: Variational Auto-Encoder Based Variability Encodin…

200 papers

Recently, a generative variational autoencoder (VAE) has been proposed for speech enhancement to model speech statistics. However, this approach only uses clean speech in the training phase, making the estimation particularly sensitive to…

Audio and Speech Processing · Electrical Eng. & Systems 2021-05-18 Huajian Fang , Guillaume Carbajal , Stefan Wermter , Timo Gerkmann

Variational auto-encoders (VAEs) are deep generative latent variable models that can be used for learning the distribution of complex data. VAEs have been successfully used to learn a probabilistic prior over speech signals, which is then…

Sound · Computer Science 2020-12-18 Mostafa Sadeghi , Simon Leglaive , Xavier Alameda-PIneda , Laurent Girin , Radu Horaud

Dynamical variational autoencoders (DVAEs) are a class of deep generative models with latent variables, dedicated to model time series of high-dimensional data. DVAEs can be considered as extensions of the variational autoencoder (VAE) that…

Sound · Computer Science 2022-10-04 Xiaoyu Bie , Simon Leglaive , Xavier Alameda-Pineda , Laurent Girin

Automatic recognition of disordered speech remains a highly challenging task to date. The underlying neuro-motor conditions, often compounded with co-occurring physical disabilities, lead to the difficulty in collecting large quantities of…

Audio and Speech Processing · Electrical Eng. & Systems 2023-03-21 Zengrui Jin , Xurong Xie , Mengzhe Geng , Tianzi Wang , Shujie Hu , Jiajun Deng , Guinan Li , Xunying Liu

Variational autoencoder-based voice conversion (VAE-VC) has the advantage of requiring only pairs of speeches and speaker labels for training. Unlike the majority of the research in VAE-VC which focuses on utilizing auxiliary losses or…

Sound · Computer Science 2021-12-07 Kei Akuzawa , Kotaro Onishi , Keisuke Takiguchi , Kohki Mametani , Koichiro Mori

The scarcity of training data and the large speaker variation in dysarthric speech lead to poor accuracy and poor speaker generalization of spoken language understanding systems for dysarthric speech. Through work on the speech features, we…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-25 Jinzi Qi , Hugo Van hamme

Tools to generate high quality synthetic speech signal that is perceptually indistinguishable from speech recorded from human speakers are easily available. Several approaches have been proposed for detecting synthetic speech. Many of these…

The Variational Autoencoder (VAE) is a powerful deep generative model that is now extensively used to represent high-dimensional complex data via a low-dimensional latent space learned in an unsupervised manner. In the original VAE model,…

Sound · Computer Science 2021-06-15 Xiaoyu Bie , Laurent Girin , Simon Leglaive , Thomas Hueber , Xavier Alameda-Pineda

In this paper we introduce a recurrent neural network (RNN) based variational autoencoder (VAE) model with a new constrained loss function that can generate more meaningful electroencephalography (EEG) features from raw EEG features to…

Audio and Speech Processing · Electrical Eng. & Systems 2020-06-05 Gautam Krishna , Co Tran , Mason Carnahan , Ahmed Tewfik

Recently, audio-visual speech enhancement has been tackled in the unsupervised settings based on variational auto-encoders (VAEs), where during training only clean data is used to train a generative model for speech, which at test time is…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-09 Mostafa Sadeghi , Xavier Alameda-Pineda

We consider the task of unsupervised extraction of meaningful latent representations of speech by applying autoencoding neural networks to speech waveforms. The goal is to learn a representation able to capture high level semantic content…

Machine Learning · Computer Science 2019-09-12 Jan Chorowski , Ron J. Weiss , Samy Bengio , Aäron van den Oord

Automatic recognition of disordered speech remains a highly challenging task to date. Sources of variability commonly found in normal speech including accent, age or gender, when further compounded with the underlying causes of speech…

Sound · Computer Science 2022-01-20 Mengzhe Geng , Shansong Liu , Jianwei Yu , Xurong Xie , Shoukang Hu , Zi Ye , Zengrui Jin , Xunying Liu , Helen Meng

Recent studies have explored the use of deep generative models of speech spectra based of variational autoencoders (VAEs), combined with unsupervised noise models, to perform speech enhancement. These studies developed iterative algorithms…

Sound · Computer Science 2019-05-15 Manuel Pariente , Antoine Deleforge , Emmanuel Vincent

Leveraging the fact that speaker identity and content vary on different time scales, \acrlong{fhvae} (\acrshort{fhvae}) uses different latent variables to symbolize these two attributes. Disentanglement of these attributes is carried out by…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-16 Yuying Xie , Thomas Arildsen , Zheng-Hua Tan

Though significant progress has been made for the voice conversion (VC) of typical speech, VC for atypical speech, e.g., dysarthric and second-language (L2) speech, remains a challenge, since it involves correcting for atypical prosody…

Audio and Speech Processing · Electrical Eng. & Systems 2021-07-26 Disong Wang , Songxiang Liu , Lifa Sun , Xixin Wu , Xunying Liu , Helen Meng

Denoising autoencoders (DAE) are trained to reconstruct their clean inputs with noise injected at the input level, while variational autoencoders (VAE) are trained with noise injected in their stochastic hidden layer, with a regularizer…

Machine Learning · Computer Science 2016-01-05 Daniel Jiwoong Im , Sungjin Ahn , Roland Memisevic , Yoshua Bengio

Despite the rapid progress of automatic speech recognition (ASR) technologies targeting normal speech in recent decades, accurate recognition of dysarthric and elderly speech remains highly challenging tasks to date. Sources of…

Audio and Speech Processing · Electrical Eng. & Systems 2022-03-18 Mengzhe Geng , Xurong Xie , Zi Ye , Tianzi Wang , Guinan Li , Shujie Hu , Xunying Liu , Helen Meng

Understanding the structure of complex, nonstationary, high-dimensional time-evolving signals is a central challenge in scientific data analysis. In many domains, such as speech and biomedical signal processing, the ability to learn…

Machine Learning · Computer Science 2026-01-13 Ioannis Ziogas , Aamna Al Shehhi , Ahsan H. Khandoker , Leontios J. Hadjileontiadis

Deep speaker embedding has achieved satisfactory performance in speaker verification. By enforcing the neural model to discriminate the speakers in the training set, deep speaker embedding (called `x-vectors`) can be derived from the hidden…

Audio and Speech Processing · Electrical Eng. & Systems 2019-08-28 Xueyi Wang , Lantian Li , Dong Wang

For our submission to the ZeroSpeech 2019 challenge, we apply discrete latent-variable neural networks to unlabelled speech and use the discovered units for speech synthesis. Unsupervised discrete subword modelling could be useful for…

‹ Prev 1 2 3 10 Next ›