Related papers: Learning Disentangled Speech Representations

Learning Disentangled Audio Representations through Controlled Synthesis

This paper tackles the scarcity of benchmarking data in disentangled auditory representation learning. We introduce SynTone, a synthetic dataset with explicit ground truth explanatory factors for evaluating disentanglement techniques.…

Sound · Computer Science 2024-02-19 Yusuf Brima , Ulf Krumnack , Simone Pika , Gunther Heidemann

Investigating Speaker Embedding Disentanglement on Natural Read Speech

Disentanglement is the task of learning representations that identify and separate factors that explain the variation observed in data. Disentangled representations are useful to increase the generalizability, explainability, and fairness…

Audio and Speech Processing · Electrical Eng. & Systems 2023-08-09 Michael Kuhlmann , Adrian Meise , Fritz Seebauer , Petra Wagner , Reinhold Haeb-Umbach

Self-Supervised Disentangled Representation Learning for Robust Target Speech Extraction

Speech signals are inherently complex as they encompass both global acoustic characteristics and local semantic information. However, in the task of target speech extraction, certain elements of global and local semantic information in the…

Sound · Computer Science 2024-08-27 Zhaoxi Mu , Xinyu Yang , Sining Sun , Qing Yang

DSVAE: Interpretable Disentangled Representation for Synthetic Speech Detection

Tools to generate high quality synthetic speech signal that is perceptually indistinguishable from speech recorded from human speakers are easily available. Several approaches have been proposed for detecting synthetic speech. Many of these…

Sound · Computer Science 2023-08-01 Amit Kumar Singh Yadav , Kratika Bhagtani , Ziyue Xiang , Paolo Bestagini , Stefano Tubaro , Edward J. Delp

Disentangled representation learning for multilingual speaker recognition

The goal of this paper is to learn robust speaker representation for bilingual speaking scenario. The majority of the world's population speak at least two languages; however, most speaker recognition systems fail to recognise the same…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-08 Kihyun Nam , Youkyum Kim , Jaesung Huh , Hee Soo Heo , Jee-weon Jung , Joon Son Chung

ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers

Self-supervised learning in speech involves training a speech representation network on a large-scale unannotated speech corpus, and then applying the learned representations to downstream tasks. Since the majority of the downstream tasks…

Sound · Computer Science 2022-06-27 Kaizhi Qian , Yang Zhang , Heting Gao , Junrui Ni , Cheng-I Lai , David Cox , Mark Hasegawa-Johnson , Shiyu Chang

Towards the Next Frontier in Speech Representation Learning Using Disentanglement

The popular frameworks for self-supervised learning of speech representations have largely focused on frame-level masked prediction of speech regions. While this has shown promising downstream task performance for speech recognition and…

Computation and Language · Computer Science 2025-07-22 Varun Krishna , Sriram Ganapathy

Disentangled Latent Speech Representation for Automatic Pathological Intelligibility Assessment

Speech intelligibility assessment plays an important role in the therapy of patients suffering from pathological speech disorders. Automatic and objective measures are desirable to assist therapists in their traditionally subjective and…

Audio and Speech Processing · Electrical Eng. & Systems 2022-06-28 Tobias Weise , Philipp Klumpp , Kubilay Can Demir , Andreas Maier , Elmar Noeth , Bjoern Heismann , Maria Schuster , Seung Hee Yang

Independence Constrained Disentangled Representation Learning from Epistemological Perspective

Disentangled Representation Learning aims to improve the explainability of deep learning methods by training a data encoder that identifies semantically meaningful latent variables in the data generation process. Nevertheless, there is no…

Machine Learning · Computer Science 2024-10-08 Ruoyu Wang , Lina Yao

Intra-class variation reduction of speaker representation in disentanglement framework

In this paper, we propose an effective training strategy to ex-tract robust speaker representations from a speech signal. Oneof the key challenges in speaker recognition tasks is to learnlatent representations or embeddings containing…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-05 Yoohwan Kwon , Soo-Whan Chung , Hong-Goo Kang

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations

We propose using self-supervised discrete representations for the task of speech resynthesis. To generate disentangled representation, we separately extract low-bitrate representations for speech content, prosodic information, and speaker…

Sound · Computer Science 2021-07-28 Adam Polyak , Yossi Adi , Jade Copet , Eugene Kharitonov , Kushal Lakhotia , Wei-Ning Hsu , Abdelrahman Mohamed , Emmanuel Dupoux

Disentangled-Transformer: An Explainable End-to-End Automatic Speech Recognition Model with Speech Content-Context Separation

End-to-end transformer-based automatic speech recognition (ASR) systems often capture multiple speech traits in their learned representations that are highly entangled, leading to a lack of interpretability. In this study, we propose the…

Audio and Speech Processing · Electrical Eng. & Systems 2024-11-28 Pu Wang , Hugo Van hamme

An empirical analysis of information encoded in disentangled neural speaker representations

The primary characteristic of robust speaker representations is that they are invariant to factors of variability not related to speaker identity. Disentanglement of speaker representations is one of the techniques used to improve…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-09 Raghuveer Peri , Haoqi Li , Krishna Somandepalli , Arindam Jati , Shrikanth Narayanan

Disentangled Speech Representation Learning Based on Factorized Hierarchical Variational Autoencoder with Self-Supervised Objective

Disentangled representation learning aims to extract explanatory features or factors and retain salient information. Factorized hierarchical variational autoencoder (FHVAE) presents a way to disentangle a speech signal into sequential-level…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-06 Yuying Xie , Thomas Arildsen , Zheng-Hua Tan

Theory and Evaluation Metrics for Learning Disentangled Representations

We make two theoretical contributions to disentanglement learning by (a) defining precise semantics of disentangled representations, and (b) establishing robust metrics for evaluation. First, we characterize the concept "disentangled…

Machine Learning · Computer Science 2021-03-22 Kien Do , Truyen Tran

Measuring disentangled generative spatio-temporal representation

Disentangled representation learning offers useful properties such as dimension reduction and interpretability, which are essential to modern deep learning approaches. Although deep learning techniques have been widely applied to…

Machine Learning · Computer Science 2022-04-11 Sichen Zhao , Wei Shao , Jeffrey Chan , Flora D. Salim

SPADE: Self-supervised Pretraining for Acoustic DisEntanglement

Self-supervised representation learning approaches have grown in popularity due to the ability to train models on large amounts of unlabeled data and have demonstrated success in diverse fields such as natural language processing, computer…

Machine Learning · Computer Science 2023-02-06 John Harvill , Jarred Barber , Arun Nair , Ramin Pishehvar

Disentangling Voice and Content with Self-Supervision for Speaker Recognition

For speaker recognition, it is difficult to extract an accurate speaker representation from speech because of its mixture of speaker traits and content. This paper proposes a disentanglement framework that simultaneously models speaker…

Audio and Speech Processing · Electrical Eng. & Systems 2023-11-02 Tianchi Liu , Kong Aik Lee , Qiongqiong Wang , Haizhou Li

Learning Disentangled Representations for Natural Language Definitions

Disentangling the encodings of neural models is a fundamental aspect for improving interpretability, semantic control and downstream task performance in Natural Language Processing. Currently, most disentanglement methods are unsupervised…

Computation and Language · Computer Science 2023-02-17 Danilo S. Carvalho , Giangiacomo Mercatali , Yingji Zhang , Andre Freitas

Disentangled Representation Learning for Environment-agnostic Speaker Recognition

This work presents a framework based on feature disentanglement to learn speaker embeddings that are robust to environmental variations. Our framework utilises an auto-encoder as a disentangler, dividing the input speaker embedding into…

Sound · Computer Science 2024-06-21 KiHyun Nam , Hee-Soo Heo , Jee-weon Jung , Joon Son Chung