Related papers: Error Reduction Network for DBLSTM-based Voice Con…

Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition

Long short-term memory (LSTM) based acoustic modeling methods have recently been shown to give state-of-the-art performance on some speech recognition tasks. To achieve a further performance improvement, in this research, deep extensions on…

Computation and Language · Computer Science 2015-05-12 Xiangang Li , Xihong Wu

Deep LSTM for Large Vocabulary Continuous Speech Recognition

Recurrent neural networks (RNNs), especially long short-term memory (LSTM) RNNs, are effective network for sequential task like speech recognition. Deeper LSTM models perform well on large vocabulary continuous speech recognition, because…

Computation and Language · Computer Science 2017-03-22 Xu Tian , Jun Zhang , Zejun Ma , Yi He , Juan Wei , Peihao Wu , Wenchang Situ , Shuai Li , Yang Zhang

Cumulative Adaptation for BLSTM Acoustic Models

This paper addresses the robust speech recognition problem as an adaptation task. Specifically, we investigate the cumulative application of adaptation methods. A bidirectional Long Short-Term Memory (BLSTM) based neural network, capable of…

Computation and Language · Computer Science 2019-06-17 Markus Kitza , Pavel Golik , Ralf Schlüter , Hermann Ney

Deep Time Delay Neural Network for Speech Enhancement with Full Data Learning

Recurrent neural networks (RNNs) have shown significant improvements in recent years for speech enhancement. However, the model complexity and inference time cost of RNNs are much higher than deep feed-forward neural networks (DNNs).…

Sound · Computer Science 2020-11-12 Cunhang Fan , Bin Liu , Jianhua Tao , Jiangyan Yi , Zhengqi Wen , Leichao Song

Bayesian Neural Network Language Modeling for Speech Recognition

State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex. They are prone to overfitting and poor generalization when…

Computation and Language · Computer Science 2022-08-30 Boyang Xue , Shoukang Hu , Junhao Xu , Mengzhe Geng , Xunying Liu , Helen Meng

Bayesian Transformer Language Models for Speech Recognition

State-of-the-art neural language models (LMs) represented by Transformers are highly complex. Their use of fixed, deterministic parameter estimates fail to account for model uncertainty and lead to over-fitting and poor generalization when…

Computation and Language · Computer Science 2021-02-10 Boyang Xue , Jianwei Yu , Junhao Xu , Shansong Liu , Shoukang Hu , Zi Ye , Mengzhe Geng , Xunying Liu , Helen Meng

Parallel and Limited Data Voice Conversion Using Stochastic Variational Deep Kernel Learning

Typically, voice conversion is regarded as an engineering problem with limited training data. The reliance on massive amounts of data hinders the practical applicability of deep learning approaches, which have been extensively researched in…

Sound · Computer Science 2023-09-11 Mohamadreza Jafaryani , Hamid Sheikhzadeh , Vahid Pourahmadi

Deep Feed-forward Sequential Memory Networks for Speech Synthesis

The Bidirectional LSTM (BLSTM) RNN based speech synthesis system is among the best parametric Text-to-Speech (TTS) systems in terms of the naturalness of generated speech, especially the naturalness in prosody. However, the model complexity…

Computation and Language · Computer Science 2018-02-27 Mengxiao Bi , Heng Lu , Shiliang Zhang , Ming Lei , Zhijie Yan

Exploring Voice Conversion based Data Augmentation in Text-Dependent Speaker Verification

In this paper, we focus on improving the performance of the text-dependent speaker verification system in the scenario of limited training data. The speaker verification system deep learning based text-dependent generally needs a large…

Sound · Computer Science 2020-11-24 Xiaoyi Qin , Yaogen Yang , Lin Yang , Xuyang Wang , Junjie Wang , Ming Li

Data-selective Transfer Learning for Multi-Domain Speech Recognition

Negative transfer in training of acoustic models for automatic speech recognition has been reported in several contexts such as domain change or speaker characteristics. This paper proposes a novel technique to overcome negative transfer by…

Machine Learning · Computer Science 2015-09-18 Mortaza Doulaty , Oscar Saz , Thomas Hain

Multi-Modal Hybrid Deep Neural Network for Speech Enhancement

Deep Neural Networks (DNN) have been successful in en- hancing noisy speech signals. Enhancement is achieved by learning a nonlinear mapping function from the features of the corrupted speech signal to that of the reference clean speech…

Machine Learning · Computer Science 2016-06-16 Zhenzhou Wu , Sunil Sivadas , Yong Kiam Tan , Ma Bin , Rick Siow Mong Goh

An Attention Long Short-Term Memory based system for automatic classification of speech intelligibility

Speech intelligibility can be degraded due to multiple factors, such as noisy environments, technical difficulties or biological conditions. This work is focused on the development of an automatic non-intrusive system for predicting the…

Audio and Speech Processing · Electrical Eng. & Systems 2024-02-07 Miguel Fernández-Díaz , Ascensión Gallardo-Antolín

Deep Long Short-Term Memory Adaptive Beamforming Networks For Multichannel Robust Speech Recognition

Far-field speech recognition in noisy and reverberant conditions remains a challenging problem despite recent deep learning breakthroughs. This problem is commonly addressed by acquiring a speech signal from multiple microphones and…

Audio and Speech Processing · Electrical Eng. & Systems 2018-10-17 Zhong Meng , Shinji Watanabe , John R. Hershey , Hakan Erdogan

Audio Data Augmentation for Acoustic-to-articulatory Speech Inversion using Bidirectional Gated RNNs

Data augmentation has proven to be a promising prospect in improving the performance of deep learning models by adding variability to training data. In previous work with developing a noise robust acoustic-to-articulatory speech inversion…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-02 Yashish M. Siriwardena , Ahmed Adel Attia , Ganesh Sivaraman , Carol Espy-Wilson

On using 2D sequence-to-sequence models for speech recognition

Attention-based sequence-to-sequence models have shown promising results in automatic speech recognition. Using these architectures, one-dimensional input and output sequences are related by an attention approach, thereby replacing more…

Computation and Language · Computer Science 2019-11-21 Parnia Bahar , Albert Zeyer , Ralf Schlüter , Hermann Ney

A Deep Learning Approach for Similar Languages, Varieties and Dialects

Deep learning mechanisms are prevailing approaches in recent days for the various tasks in natural language processing, speech recognition, image processing and many others. To leverage this we use deep learning based mechanism specifically…

Computation and Language · Computer Science 2019-01-03 Vidya Prasad K , Akarsh S , Vinayakumar R , Soman KP

High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model

While the community keeps promoting end-to-end models over conventional hybrid models, which usually are long short-term memory (LSTM) models trained with a cross entropy criterion followed by a sequence discriminative training criterion,…

Audio and Speech Processing · Electrical Eng. & Systems 2020-03-18 Jinyu Li , Rui Zhao , Eric Sun , Jeremy H. M. Wong , Amit Das , Zhong Meng , Yifan Gong

High Performance Sequence-to-Sequence Model for Streaming Speech Recognition

Recently sequence-to-sequence models have started to achieve state-of-the-art performance on standard speech recognition tasks when processing audio data in batch mode, i.e., the complete audio data is available when starting processing.…

Audio and Speech Processing · Electrical Eng. & Systems 2020-07-28 Thai-Son Nguyen , Ngoc-Quan Pham , Sebastian Stueker , Alex Waibel

Optimizing voice conversion network with cycle consistency loss of speaker identity

We propose a novel training scheme to optimize voice conversion network with a speaker identity loss function. The training scheme not only minimizes frame-level spectral loss, but also speaker identity loss. We introduce a cycle…

Sound · Computer Science 2020-11-18 Hongqiang Du , Xiaohai Tian , Lei Xie , Haizhou Li

Bayesian Learning for Deep Neural Network Adaptation

A key task for speech recognition systems is to reduce the mismatch between training and evaluation data that is often attributable to speaker differences. Speaker adaptation techniques play a vital role to reduce the mismatch. Model-based…

Sound · Computer Science 2024-06-17 Xurong Xie , Xunying Liu , Tan Lee , Lan Wang