Related papers: A Deep Representation Learning-based Speech Enhanc…

A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training

This paper focuses on leveraging deep representation learning (DRL) for speech enhancement (SE). In general, the performance of the deep neural network (DNN) is heavily dependent on the learning of data representation. However, the DRL's…

Audio and Speech Processing · Electrical Eng. & Systems 2023-09-28 Yang Xiang , Jesper Lisby Højvang , Morten Højfeldt Rasmussen , Mads Græsbøll Christensen

A deep representation learning speech enhancement method using $\beta$-VAE

In previous work, we proposed a variational autoencoder-based (VAE) Bayesian permutation training speech enhancement (SE) method (PVAE) which indicated that the SE performance of the traditional deep neural network-based (DNN) method could…

Audio and Speech Processing · Electrical Eng. & Systems 2022-05-12 Yang Xiang , Jesper Lisby Højvang , Morten Højfeldt Rasmussen , Mads Græsbøll Christensen

A Bayesian Permutation training deep representation learning method for speech enhancement with variational autoencoder

Recently, variational autoencoder (VAE), a deep representation learning (DRL) model, has been used to perform speech enhancement (SE). However, to the best of our knowledge, current VAE-based SE methods only apply VAE to the model speech…

Audio and Speech Processing · Electrical Eng. & Systems 2022-01-25 Yang Xiang , Jesper Lisby Højvang , Morten Højfeldt Rasmussen , Mads Græsbøll Christensen

I-DCCRN-VAE: An Improved Deep Representation Learning Framework for Complex VAE-based Single-channel Speech Enhancement

Recently, a complex variational autoencoder (VAE)-based single-channel speech enhancement system based on the DCCRN architecture has been proposed. In this system, a noise suppression VAE (NSVAE) learns to extract clean speech…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-03 Jiatong Li , Simon Doclo

Complex Recurrent Variational Autoencoder with Application to Speech Enhancement

As an extension of variational autoencoder (VAE), complex VAE uses complex Gaussian distributions to model latent variables and data. This work proposes a complex recurrent VAE framework, specifically in which complex-valued recurrent…

Audio and Speech Processing · Electrical Eng. & Systems 2024-10-28 Yuying Xie , Thomas Arildsen , Zheng-Hua Tan

Constrained Variational Autoencoder for improving EEG based Speech Recognition Systems

In this paper we introduce a recurrent neural network (RNN) based variational autoencoder (VAE) model with a new constrained loss function that can generate more meaningful electroencephalography (EEG) features from raw EEG features to…

Audio and Speech Processing · Electrical Eng. & Systems 2020-06-05 Gautam Krishna , Co Tran , Mason Carnahan , Ahmed Tewfik

A Recurrent Variational Autoencoder for Speech Enhancement

This paper presents a generative approach to speech enhancement based on a recurrent variational autoencoder (RVAE). The deep generative speech model is trained using clean speech signals only, and it is combined with a nonnegative matrix…

Machine Learning · Computer Science 2020-02-11 Simon Leglaive , Xavier Alameda-Pineda , Laurent Girin , Radu Horaud

Statistical Speech Enhancement Based on Probabilistic Integration of Variational Autoencoder and Non-Negative Matrix Factorization

This paper presents a statistical method of single-channel speech enhancement that uses a variational autoencoder (VAE) as a prior distribution on clean speech. A standard approach to speech enhancement is to train a deep neural network…

Sound · Computer Science 2019-03-12 Yoshiaki Bando , Masato Mimura , Katsutoshi Itoyama , Kazuyoshi Yoshii , Tatsuya Kawahara

Audio-visual Speech Enhancement Using Conditional Variational Auto-Encoders

Variational auto-encoders (VAEs) are deep generative latent variable models that can be used for learning the distribution of complex data. VAEs have been successfully used to learn a probabilistic prior over speech signals, which is then…

Sound · Computer Science 2020-12-18 Mostafa Sadeghi , Simon Leglaive , Xavier Alameda-PIneda , Laurent Girin , Radu Horaud

AEVB-Comm: An Intelligent CommunicationSystem based on AEVBs

In recent years, applying Deep Learning (DL) techniques emerged as a common practice in the communication system, demonstrating promising results. The present paper proposes a new Convolutional Neural Network (CNN) based Variational…

Signal Processing · Electrical Eng. & Systems 2020-05-20 Raghu Vamshi Hemadri , Akshay Rayaluru , Rahul Jashvantbhai Pandya

Unsupervised Representation Disentanglement using Cross Domain Features and Adversarial Learning in Variational Autoencoder based Voice Conversion

An effective approach for voice conversion (VC) is to disentangle linguistic content from other components in the speech signal. The effectiveness of variational autoencoder (VAE) based VC (VAE-VC), for instance, strongly relies on this…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-09 Wen-Chin Huang , Hao Luo , Hsin-Te Hwang , Chen-Chou Lo , Yu-Huai Peng , Yu Tsao , Hsin-Min Wang

DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement

Speech enhancement has benefited from the success of deep learning in terms of intelligibility and perceptual quality. Conventional time-frequency (TF) domain methods focus on predicting TF-masks or speech spectrum, via a naive convolution…

Audio and Speech Processing · Electrical Eng. & Systems 2020-09-24 Yanxin Hu , Yun Liu , Shubo Lv , Mengtao Xing , Shimin Zhang , Yihui Fu , Jian Wu , Bihong Zhang , Lei Xie

Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks

Speech enhancement (SE) aims to reduce noise in speech signals. Most SE techniques focus only on addressing audio information. In this work, inspired by multimodal learning, which utilizes data from different modalities, and the recent…

Sound · Computer Science 2022-04-19 Jen-Cheng Hou , Syu-Siang Wang , Ying-Hui Lai , Yu Tsao , Hsiu-Wen Chang , Hsin-Min Wang

Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks

Speech enhancement (SE) aims to reduce noise in speech signals. Most SE techniques focus only on addressing audio information. In this work, inspired by multimodal learning, which utilizes data from different modalities, and the recent…

Sound · Computer Science 2018-01-25 Jen-Cheng Hou , Syu-Siang Wang , Ying-Hui Lai , Yu Tsao , Hsiu-Wen Chang , Hsin-Min Wang

Recurrent Neural Network-Based Semantic Variational Autoencoder for Sequence-to-Sequence Learning

Sequence-to-sequence (Seq2seq) models have played an important role in the recent success of various natural language processing methods, such as machine translation, text summarization, and speech recognition. However, current Seq2seq…

Computation and Language · Computer Science 2018-06-05 Myeongjun Jang , Seungwan Seo , Pilsung Kang

Deep Feature Consistent Variational Autoencoder

We present a novel method for constructing Variational Autoencoder (VAE). Instead of using pixel-by-pixel loss, we enforce deep feature consistency between the input and the output of a VAE, which ensures the VAE's output to preserve the…

Computer Vision and Pattern Recognition · Computer Science 2024-03-21 Xianxu Hou , Linlin Shen , Ke Sun , Guoping Qiu

A Benchmark of Dynamical Variational Autoencoders applied to Speech Spectrogram Modeling

The Variational Autoencoder (VAE) is a powerful deep generative model that is now extensively used to represent high-dimensional complex data via a low-dimensional latent space learned in an unsupervised manner. In the original VAE model,…

Sound · Computer Science 2021-06-15 Xiaoyu Bie , Laurent Girin , Simon Leglaive , Thomas Hueber , Xavier Alameda-Pineda

Deep clustering with fusion autoencoder

Embracing the deep learning techniques for representation learning in clustering research has attracted broad attention in recent years, yielding a newly developed clustering paradigm, viz. the deep clustering (DC). Typically, the DC models…

Machine Learning · Computer Science 2022-01-17 Shuai Chang

Investigation of Speech and Noise Latent Representations in Single-channel VAE-based Speech Enhancement

Recently, a variational autoencoder (VAE)-based single-channel speech enhancement system using Bayesian permutation training has been proposed, which uses two pretrained VAEs to obtain latent representations for speech and noise. Based on…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-03 Jiatong Li , Simon Doclo

Deep Generative Variational Autoencoding for Replay Spoof Detection in Automatic Speaker Verification

Automatic speaker verification (ASV) systems are highly vulnerable to presentation attacks, also called spoofing attacks. Replay is among the simplest attacks to mount - yet difficult to detect reliably. The generalization failure of…

Audio and Speech Processing · Electrical Eng. & Systems 2020-03-24 Bhusan Chettri , Tomi Kinnunen , Emmanouil Benetos