Related papers: CNN-LSTM models for Multi-Speaker Source Separatio…

Analysis of memory in LSTM-RNNs for source separation

Long short-term memory recurrent neural networks (LSTM-RNNs) are considered state-of-the art in many speech processing tasks. The recurrence in the network, in principle, allows any input to be remembered for an indefinite time, a feature…

Audio and Speech Processing · Electrical Eng. & Systems 2020-09-02 Jeroen Zegers , Hugo Van hamme

MMDenseLSTM: An efficient combination of convolutional and recurrent neural networks for audio source separation

Deep neural networks have become an indispensable technique for audio source separation (ASS). It was recently reported that a variant of CNN architecture called MMDenseNet was successfully employed to solve the ASS problem of estimating…

Sound · Computer Science 2018-05-30 Naoya Takahashi , Nabarun Goswami , Yuki Mitsufuji

Atss-Net: Target Speaker Separation via Attention-based Neural Network

Recently, Convolutional Neural Network (CNN) and Long short-term memory (LSTM) based models have been introduced to deep learning-based target speaker separation. In this paper, we propose an Attention-based neural network (Atss-Net) in the…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-20 Tingle Li , Qingjian Lin , Yuanyuan Bao , Ming Li

Bayesian Neural Network Language Modeling for Speech Recognition

State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex. They are prone to overfitting and poor generalization when…

Computation and Language · Computer Science 2022-08-30 Boyang Xue , Shoukang Hu , Junhao Xu , Mengzhe Geng , Xunying Liu , Helen Meng

Long Short-Term Memory based Convolutional Recurrent Neural Networks for Large Vocabulary Speech Recognition

Long short-term memory (LSTM) recurrent neural networks (RNNs) have been shown to give state-of-the-art performance on many speech recognition tasks, as they are able to provide the learned dynamically changing contextual window of all…

Computation and Language · Computer Science 2016-10-12 Xiangang Li , Xihong Wu

Deep Bayesian Unsupervised Source Separation Based on a Complex Gaussian Mixture Model

This paper presents an unsupervised method that trains neural source separation by using only multichannel mixture signals. Conventional neural separation methods require a lot of supervised data to achieve excellent performance. Although…

Sound · Computer Science 2019-08-30 Yoshiaki Bando , Yoko Sasaki , Kazuyoshi Yoshii

Parallel Long Short-Term Memory for Multi-stream Classification

Recently, machine learning methods have provided a broad spectrum of original and efficient algorithms based on Deep Neural Networks (DNN) to automatically predict an outcome with respect to a sequence of inputs. Recurrent hidden cells…

Machine Learning · Computer Science 2017-02-15 Mohamed Bouaziz , Mohamed Morchid , Richard Dufour , Georges Linarès , Renato De Mori

Sams-Net: A Sliced Attention-based Neural Network for Music Source Separation

Convolutional Neural Network (CNN) or Long short-term memory (LSTM) based models with the input of spectrogram or waveforms are commonly used for deep learning based audio source separation. In this paper, we propose a Sliced…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-20 Tingle Li , Jiawei Chen , Haowen Hou , Ming Li

Extracting the Auditory Attention in a Dual-Speaker Scenario from EEG using a Joint CNN-LSTM Model

Human brain performs remarkably well in segregating a particular speaker from interfering ones in a multi-speaker scenario. It has been recently shown that we can quantitatively evaluate the segregation capability by modelling the…

Sound · Computer Science 2021-07-12 Ivine Kuruvila , Jan Muncke , Eghart Fischer , Ulrich Hoppe

Embedding Recurrent Layers with Dual-Path Strategy in a Variant of Convolutional Network for Speaker-Independent Speech Separation

Speaker-independent speech separation has achieved remarkable performance in recent years with the development of deep neural network (DNN). Various network architectures, from traditional convolutional neural network (CNN) and recurrent…

Audio and Speech Processing · Electrical Eng. & Systems 2022-06-17 Xue Yang , Changchun Bao

Multi-channel Speech Separation Using Spatially Selective Deep Non-linear Filters

In a multi-channel separation task with multiple speakers, we aim to recover all individual speech signals from the mixture. In contrast to single-channel approaches, which rely on the different spectro-temporal characteristics of the…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-11 Kristina Tesch , Timo Gerkmann

Deep LSTM for Large Vocabulary Continuous Speech Recognition

Recurrent neural networks (RNNs), especially long short-term memory (LSTM) RNNs, are effective network for sequential task like speech recognition. Deeper LSTM models perform well on large vocabulary continuous speech recognition, because…

Computation and Language · Computer Science 2017-03-22 Xu Tian , Jun Zhang , Zejun Ma , Yi He , Juan Wei , Peihao Wu , Wenchang Situ , Shuai Li , Yang Zhang

Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition

Long Short-Term Memory (LSTM) is a recurrent neural network (RNN) architecture that has been designed to address the vanishing and exploding gradient problems of conventional RNNs. Unlike feedforward neural networks, RNNs have cyclic…

Neural and Evolutionary Computing · Computer Science 2014-02-06 Haşim Sak , Andrew Senior , Françoise Beaufays

CNN+CNN: Convolutional Decoders for Image Captioning

Image captioning is a challenging task that combines the field of computer vision and natural language processing. A variety of approaches have been proposed to achieve the goal of automatically describing an image, and recurrent neural…

Computer Vision and Pattern Recognition · Computer Science 2018-05-24 Qingzhong Wang , Antoni B. Chan

Neural Speech Separation Using Spatially Distributed Microphones

This paper proposes a neural network based speech separation method using spatially distributed microphones. Unlike with traditional microphone array settings, neither the number of microphones nor their spatial arrangement is known in…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-01 Dongmei Wang , Zhuo Chen , Takuya Yoshioka

Monaural Speech Enhancement Using a Multi-Branch Temporal Convolutional Network

Deep learning has achieved substantial improvement on single-channel speech enhancement tasks. However, the performance of multi-layer perceptions (MLPs)-based methods is limited by the ability to capture the long-term effective history…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-19 Qiquan Zhang , Aaron Nicolson , Mingjiang Wang , Kuldip K. Paliwal , Chenxu Wang

Tensor-Train Long Short-Term Memory for Monaural Speech Enhancement

In recent years, Long Short-Term Memory (LSTM) has become a popular choice for speech separation and speech enhancement task. The capability of LSTM network can be enhanced by widening and adding more layers. However, this would introduce…

Sound · Computer Science 2018-12-27 Suman Samui , Indrajit Chakrabarti , Soumya K. Ghosh

Memory Time Span in LSTMs for Multi-Speaker Source Separation

With deep learning approaches becoming state-of-the-art in many speech (as well as non-speech) related machine learning tasks, efforts are being taken to delve into the neural networks which are often considered as a black box. In this…

Machine Learning · Computer Science 2018-08-27 Jeroen Zegers , Hugo Van hamme

Go Beyond Multiple Instance Neural Networks: Deep-learning Models based on Local Pattern Aggregation

Deep convolutional neural networks (CNNs) have brought breakthroughs in processing clinical electrocardiograms (ECGs), speaker-independent speech and complex images. However, typical CNNs require a fixed input size while it is common to…

Machine Learning · Computer Science 2022-10-07 Linpeng Jin

Dual Convolutional LSTM Network for Referring Image Segmentation

We consider referring image segmentation. It is a problem at the intersection of computer vision and natural language understanding. Given an input image and a referring expression in the form of a natural language sentence, the goal is to…

Computer Vision and Pattern Recognition · Computer Science 2020-02-03 Linwei Ye , Zhi Liu , Yang Wang