Related papers: X-DC: Explainable Deep Clustering based on Learnab…

Multi-Task Learning with High-Order Statistics for X-vector based Text-Independent Speaker Verification

The x-vector based deep neural network (DNN) embedding systems have demonstrated effectiveness for text-independent speaker verification. This paper presents a multi-task learning architecture for training the speaker embedding DNN with the…

Audio and Speech Processing · Electrical Eng. & Systems 2019-04-05 Lanhua You , Wu Guo , Lirong Dai , Jun Du

Deep clustering: Discriminative embeddings for segmentation and separation

We address the problem of acoustic source separation in a deep learning framework we call "deep clustering." Rather than directly estimating signals or masking functions, we train a deep network to produce spectrogram embeddings that are…

Neural and Evolutionary Computing · Computer Science 2015-08-19 John R. Hershey , Zhuo Chen , Jonathan Le Roux , Shinji Watanabe

Analyzing deep CNN-based utterance embeddings for acoustic model adaptation

We explore why deep convolutional neural networks (CNNs) with small two-dimensional kernels, primarily used for modeling spatial relations in images, are also effective in speech recognition. We analyze the representations learned by deep…

Computation and Language · Computer Science 2018-11-13 Joanna Rownicka , Peter Bell , Steve Renals

Interpreting Black-box Machine Learning Models for High Dimensional Datasets

Deep neural networks (DNNs) have been shown to outperform traditional machine learning algorithms in a broad variety of application domains due to their effectiveness in modeling complex problems and handling high-dimensional datasets. Many…

Machine Learning · Computer Science 2024-07-30 Md. Rezaul Karim , Md. Shajalal , Alex Graß , Till Döhmen , Sisay Adugna Chala , Alexander Boden , Christian Beecks , Stefan Decker

Deep Clustering and Conventional Networks for Music Separation: Stronger Together

Deep clustering is the first method to handle general audio separation scenarios with multiple sources of the same type and an arbitrary number of sources, performing impressively in speaker-independent speech separation tasks. However,…

Machine Learning · Statistics 2017-11-30 Yi Luo , Zhuo Chen , John R. Hershey , Jonathan Le Roux , Nima Mesgarani

XAI for Self-supervised Clustering of Wireless Spectrum Activity

The so-called black-box deep learning (DL) models are increasingly used in classification tasks across many scientific disciplines, including wireless communications domain. In this trend, supervised DL models appear as most commonly…

Machine Learning · Computer Science 2025-08-04 Ljupcho Milosheski , Gregor Cerar , Blaž Bertalanič , Carolina Fortuna , Mihael Mohorčič

Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features

Deep clustering (DC) and utterance-level permutation invariant training (uPIT) have been demonstrated promising for speaker-independent speech separation. DC is usually formulated as two-step processes: embedding learning and embedding…

Sound · Computer Science 2019-07-24 Cunhang Fan , Bin Liu , Jianhua Tao , Jiangyan Yi , Zhengqi Wen

Explaining Deep Neural Networks using Unsupervised Clustering

We propose a novel method to explain trained deep neural networks (DNNs), by distilling them into surrogate models using unsupervised clustering. Our method can be applied flexibly to any subset of layers of a DNN architecture and can…

Computer Vision and Pattern Recognition · Computer Science 2020-07-17 Yu-han Liu , Sercan O. Arik

Deep Discriminative Clustering Analysis

Traditional clustering methods often perform clustering with low-level indiscriminative representations and ignore relationships between patterns, resulting in slight achievements in the era of deep learning. To handle this problem, we…

Machine Learning · Computer Science 2019-05-07 Jianlong Chang , Yiwen Guo , Lingfeng Wang , Gaofeng Meng , Shiming Xiang , Chunhong Pan

Simultaneous Denoising and Dereverberation Using Deep Embedding Features

Monaural speech dereverberation is a very challenging task because no spatial cues can be used. When the additive noises exist, this task becomes more challenging. In this paper, we propose a joint training method for simultaneous speech…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-07 Cunhang Fan , Jianhua Tao , Bin Liu , Jiangyan Yi , Zhengqi Wen

Interpretable Factorization for Neural Network ECG Models

The ability of deep learning (DL) to improve the practice of medicine and its clinical outcomes faces a looming obstacle: model interpretation. Without description of how outputs are generated, a collaborating physician can neither resolve…

Machine Learning · Computer Science 2020-06-30 Christopher Snyder , Sriram Vishwanath

Deep Transfer Clustering of Radio Signals

Modulation recognition is an important task in radio signal processing. Most of the current researches focus on supervised learning. However, in many real scenarios, it is difficult and cost to obtain the labels of signals. In this letter,…

Signal Processing · Electrical Eng. & Systems 2021-07-27 Qi Xuan , Xiaohui Li , Zhuangzhi Chen , Dongwei Xu , Shilian Zheng , Xiaoniu Yang

Hierarchical interpretations for neural network predictions

Deep neural networks (DNNs) have achieved impressive predictive performance due to their ability to learn complex, non-linear relationships between variables. However, the inability to effectively visualize these relationships has led to…

Machine Learning · Computer Science 2019-01-17 Chandan Singh , W. James Murdoch , Bin Yu

Towards Debugging Deep Neural Networks by Generating Speech Utterances

Deep neural networks (DNN) are able to successfully process and classify speech utterances. However, understanding the reason behind a classification by DNN is difficult. One such debugging method used with image classification DNNs is…

Machine Learning · Computer Science 2019-07-09 Bilal Soomro , Anssi Kanervisto , Trung Ngo Trong , Ville Hautamäki

ECAPA-TDNN Embeddings for Speaker Diarization

Learning robust speaker embeddings is a crucial step in speaker diarization. Deep neural networks can accurately capture speaker discriminative characteristics and popular deep embeddings such as x-vectors are nowadays a fundamental…

Audio and Speech Processing · Electrical Eng. & Systems 2021-09-14 Nauman Dawalatabad , Mirco Ravanelli , François Grondin , Jenthe Thienpondt , Brecht Desplanques , Hwidong Na

A Survey on Understanding, Visualizations, and Explanation of Deep Neural Networks

Recent advancements in machine learning and signal processing domains have resulted in an extensive surge of interest in Deep Neural Networks (DNNs) due to their unprecedented performance and high accuracy for different and challenging…

Machine Learning · Computer Science 2021-02-04 Atefeh Shahroudnejad

An Improved Deep Neural Network for Modeling Speaker Characteristics at Different Temporal Scales

This paper presents an improved deep embedding learning method based on convolutional neural network (CNN) for text-independent speaker verification. Two improvements are proposed for x-vector embedding learning: (1) Multi-scale convolution…

Audio and Speech Processing · Electrical Eng. & Systems 2020-01-15 Bin Gu , Wu Guo

An iterative framework for self-supervised deep speaker representation learning

In this paper, we propose an iterative framework for self-supervised speaker representation learning based on a deep neural network (DNN). The framework starts with training a self-supervision speaker embedding network by maximizing…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-29 Danwei Cai , Weiqing Wang , Ming Li

Deep neural network techniques for monaural speech enhancement: state of the art analysis

Deep neural networks (DNN) techniques have become pervasive in domains such as natural language processing and computer vision. They have achieved great success in these domains in task such as machine translation and image generation. Due…

Sound · Computer Science 2023-06-21 Peter Ochieng

Deep convolutional acoustic word embeddings using word-pair side information

Recent studies have been revisiting whole words as the basic modelling unit in speech recognition and query applications, instead of phonetic units. Such whole-word segmental systems rely on a function that maps a variable-length speech…

Computation and Language · Computer Science 2016-01-11 Herman Kamper , Weiran Wang , Karen Livescu