Related papers: Dictionary Update for NMF-based Voice Conversion U…
Reducing the interference noise in a monaural noisy speech signal has been a challenging task for many years. Compared to traditional unsupervised speech enhancement methods, e.g., Wiener filtering, supervised approaches, such as algorithms…
In this paper, we present RT-GCC-NMF: a real-time (RT), two-channel blind speech enhancement algorithm that combines the non-negative matrix factorization (NMF) dictionary learning algorithm with the generalized cross-correlation (GCC)…
Discriminative models for source separation have recently been shown to produce impressive results. However, when operating on sources outside of the training set, these models can not perform as well and are cumbersome to update. Classical…
This paper investigates a non-negative matrix factorization (NMF)-based approach to the semi-supervised single-channel speech enhancement problem where only non-stationary additive noise signals are given. The proposed method relies on…
In this paper we address the problem of enhancing speech signals in noisy mixtures using a source separation approach. We explore the use of neural networks as an alternative to a popular speech variance model based on supervised…
Non-negative Matrix Factorization (NMF) is a powerful technique for analyzing regularly-sampled data, i.e., data that can be stored in a matrix. For audio, this has led to numerous applications using time-frequency (TF) representations like…
High-quality speech corpora are essential foundations for most speech applications. However, such speech data are expensive and limited since they are collected in professional recording environments. In this work, we propose an…
Neural machine translation (NMT) models generally adopt an encoder-decoder architecture for modeling the entire translation process. The encoder summarizes the representation of input sentence from scratch, which is potentially a problem if…
Mel-frequency filter bank (MFB) based approaches have the advantage of learning speech compared to raw spectrum since MFB has less feature size. However, speech generator with MFB approaches require additional vocoder that needs a huge…
For most of the state-of-the-art speech enhancement techniques, a spectrogram is usually preferred than the respective time-domain raw data since it reveals more compact presentation together with conspicuous temporal information over a…
This paper describes a practical dual-process speech enhancement system that adapts environment-sensitive frame-online beamforming (front-end) with help from environment-free block-online source separation (back-end). To use minimum…
Non-negative Matrix Factorization (NMF) has already been applied to learn speaker characterizations from single or non-simultaneous speech for speaker recognition applications. It is also known for its good performance in (blind) source…
Nonnegative Matrix Factorization (NMF) is a powerful tool for decomposing mixtures of audio signals in the Time-Frequency (TF) domain. In the source separation framework, the phase recovery for each extracted component is necessary for…
Adapting End-to-End ASR models to out-of-domain datasets with text data is challenging. Factorized neural Transducer (FNT) aims to address this issue by introducing a separate vocabulary decoder to predict the vocabulary. Nonetheless, this…
Neural Machine Translation (NMT) models generally perform translation using a fixed-size lexical vocabulary, which is an important bottleneck on their generalization capability and overall translation quality. The standard approach to…
The task of speaker change detection (SCD), which detects points where speakers change in an input, is essential for several applications. Several studies solved the SCD task using audio inputs only and have shown limited performance.…
This paper presents two single channel speech dereverberation methods to enhance the quality of speech signals that have been recorded in an enclosed space. For both methods, the room acoustics are modeled using a nonnegative approximation…
Recent improvements in computing allow for the processing and analysis of very large datasets in a variety of fields. Often the analysis requires the creation of low-rank approximations to the datasets leading to efficient storage. This…
Convolutive Non-Negative Matrix Factorization model factorizes a given audio spectrogram using frequency templates with a temporal dimension. In this paper, we present a convolutional auto-encoder model that acts as a neural network…
Non-negative matrix factorization (NMF) is an important tool in signal processing and widely used to separate mixed sources into their components. Algorithms for NMF require that the user choose the number of components in advance, and if…