Related papers: X-DC: Explainable Deep Clustering based on Learnab…
The x-vector based deep neural network (DNN) embedding systems have demonstrated effectiveness for text-independent speaker verification. This paper presents a multi-task learning architecture for training the speaker embedding DNN with the…
We address the problem of acoustic source separation in a deep learning framework we call "deep clustering." Rather than directly estimating signals or masking functions, we train a deep network to produce spectrogram embeddings that are…
We explore why deep convolutional neural networks (CNNs) with small two-dimensional kernels, primarily used for modeling spatial relations in images, are also effective in speech recognition. We analyze the representations learned by deep…
Deep neural networks (DNNs) have been shown to outperform traditional machine learning algorithms in a broad variety of application domains due to their effectiveness in modeling complex problems and handling high-dimensional datasets. Many…
Deep clustering is the first method to handle general audio separation scenarios with multiple sources of the same type and an arbitrary number of sources, performing impressively in speaker-independent speech separation tasks. However,…
The so-called black-box deep learning (DL) models are increasingly used in classification tasks across many scientific disciplines, including wireless communications domain. In this trend, supervised DL models appear as most commonly…
Deep clustering (DC) and utterance-level permutation invariant training (uPIT) have been demonstrated promising for speaker-independent speech separation. DC is usually formulated as two-step processes: embedding learning and embedding…
We propose a novel method to explain trained deep neural networks (DNNs), by distilling them into surrogate models using unsupervised clustering. Our method can be applied flexibly to any subset of layers of a DNN architecture and can…
Traditional clustering methods often perform clustering with low-level indiscriminative representations and ignore relationships between patterns, resulting in slight achievements in the era of deep learning. To handle this problem, we…
Monaural speech dereverberation is a very challenging task because no spatial cues can be used. When the additive noises exist, this task becomes more challenging. In this paper, we propose a joint training method for simultaneous speech…
The ability of deep learning (DL) to improve the practice of medicine and its clinical outcomes faces a looming obstacle: model interpretation. Without description of how outputs are generated, a collaborating physician can neither resolve…
Modulation recognition is an important task in radio signal processing. Most of the current researches focus on supervised learning. However, in many real scenarios, it is difficult and cost to obtain the labels of signals. In this letter,…
Deep neural networks (DNNs) have achieved impressive predictive performance due to their ability to learn complex, non-linear relationships between variables. However, the inability to effectively visualize these relationships has led to…
Deep neural networks (DNN) are able to successfully process and classify speech utterances. However, understanding the reason behind a classification by DNN is difficult. One such debugging method used with image classification DNNs is…
Learning robust speaker embeddings is a crucial step in speaker diarization. Deep neural networks can accurately capture speaker discriminative characteristics and popular deep embeddings such as x-vectors are nowadays a fundamental…
Recent advancements in machine learning and signal processing domains have resulted in an extensive surge of interest in Deep Neural Networks (DNNs) due to their unprecedented performance and high accuracy for different and challenging…
This paper presents an improved deep embedding learning method based on convolutional neural network (CNN) for text-independent speaker verification. Two improvements are proposed for x-vector embedding learning: (1) Multi-scale convolution…
In this paper, we propose an iterative framework for self-supervised speaker representation learning based on a deep neural network (DNN). The framework starts with training a self-supervision speaker embedding network by maximizing…
Deep neural networks (DNN) techniques have become pervasive in domains such as natural language processing and computer vision. They have achieved great success in these domains in task such as machine translation and image generation. Due…
Recent studies have been revisiting whole words as the basic modelling unit in speech recognition and query applications, instead of phonetic units. Such whole-word segmental systems rely on a function that maps a variable-length speech…