English
Related papers

Related papers: Learning Disentangled Speech Representations

200 papers

This paper tackles the scarcity of benchmarking data in disentangled auditory representation learning. We introduce SynTone, a synthetic dataset with explicit ground truth explanatory factors for evaluating disentanglement techniques.…

Sound · Computer Science 2024-02-19 Yusuf Brima , Ulf Krumnack , Simone Pika , Gunther Heidemann

Disentanglement is the task of learning representations that identify and separate factors that explain the variation observed in data. Disentangled representations are useful to increase the generalizability, explainability, and fairness…

Audio and Speech Processing · Electrical Eng. & Systems 2023-08-09 Michael Kuhlmann , Adrian Meise , Fritz Seebauer , Petra Wagner , Reinhold Haeb-Umbach

Speech signals are inherently complex as they encompass both global acoustic characteristics and local semantic information. However, in the task of target speech extraction, certain elements of global and local semantic information in the…

Sound · Computer Science 2024-08-27 Zhaoxi Mu , Xinyu Yang , Sining Sun , Qing Yang

Tools to generate high quality synthetic speech signal that is perceptually indistinguishable from speech recorded from human speakers are easily available. Several approaches have been proposed for detecting synthetic speech. Many of these…

The goal of this paper is to learn robust speaker representation for bilingual speaking scenario. The majority of the world's population speak at least two languages; however, most speaker recognition systems fail to recognise the same…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-08 Kihyun Nam , Youkyum Kim , Jaesung Huh , Hee Soo Heo , Jee-weon Jung , Joon Son Chung

Self-supervised learning in speech involves training a speech representation network on a large-scale unannotated speech corpus, and then applying the learned representations to downstream tasks. Since the majority of the downstream tasks…

The popular frameworks for self-supervised learning of speech representations have largely focused on frame-level masked prediction of speech regions. While this has shown promising downstream task performance for speech recognition and…

Computation and Language · Computer Science 2025-07-22 Varun Krishna , Sriram Ganapathy

Speech intelligibility assessment plays an important role in the therapy of patients suffering from pathological speech disorders. Automatic and objective measures are desirable to assist therapists in their traditionally subjective and…

Audio and Speech Processing · Electrical Eng. & Systems 2022-06-28 Tobias Weise , Philipp Klumpp , Kubilay Can Demir , Andreas Maier , Elmar Noeth , Bjoern Heismann , Maria Schuster , Seung Hee Yang

Disentangled Representation Learning aims to improve the explainability of deep learning methods by training a data encoder that identifies semantically meaningful latent variables in the data generation process. Nevertheless, there is no…

Machine Learning · Computer Science 2024-10-08 Ruoyu Wang , Lina Yao

In this paper, we propose an effective training strategy to ex-tract robust speaker representations from a speech signal. Oneof the key challenges in speaker recognition tasks is to learnlatent representations or embeddings containing…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-05 Yoohwan Kwon , Soo-Whan Chung , Hong-Goo Kang

We propose using self-supervised discrete representations for the task of speech resynthesis. To generate disentangled representation, we separately extract low-bitrate representations for speech content, prosodic information, and speaker…

End-to-end transformer-based automatic speech recognition (ASR) systems often capture multiple speech traits in their learned representations that are highly entangled, leading to a lack of interpretability. In this study, we propose the…

Audio and Speech Processing · Electrical Eng. & Systems 2024-11-28 Pu Wang , Hugo Van hamme

The primary characteristic of robust speaker representations is that they are invariant to factors of variability not related to speaker identity. Disentanglement of speaker representations is one of the techniques used to improve…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-09 Raghuveer Peri , Haoqi Li , Krishna Somandepalli , Arindam Jati , Shrikanth Narayanan

Disentangled representation learning aims to extract explanatory features or factors and retain salient information. Factorized hierarchical variational autoencoder (FHVAE) presents a way to disentangle a speech signal into sequential-level…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-06 Yuying Xie , Thomas Arildsen , Zheng-Hua Tan

We make two theoretical contributions to disentanglement learning by (a) defining precise semantics of disentangled representations, and (b) establishing robust metrics for evaluation. First, we characterize the concept "disentangled…

Machine Learning · Computer Science 2021-03-22 Kien Do , Truyen Tran

Disentangled representation learning offers useful properties such as dimension reduction and interpretability, which are essential to modern deep learning approaches. Although deep learning techniques have been widely applied to…

Machine Learning · Computer Science 2022-04-11 Sichen Zhao , Wei Shao , Jeffrey Chan , Flora D. Salim

Self-supervised representation learning approaches have grown in popularity due to the ability to train models on large amounts of unlabeled data and have demonstrated success in diverse fields such as natural language processing, computer…

Machine Learning · Computer Science 2023-02-06 John Harvill , Jarred Barber , Arun Nair , Ramin Pishehvar

For speaker recognition, it is difficult to extract an accurate speaker representation from speech because of its mixture of speaker traits and content. This paper proposes a disentanglement framework that simultaneously models speaker…

Audio and Speech Processing · Electrical Eng. & Systems 2023-11-02 Tianchi Liu , Kong Aik Lee , Qiongqiong Wang , Haizhou Li

Disentangling the encodings of neural models is a fundamental aspect for improving interpretability, semantic control and downstream task performance in Natural Language Processing. Currently, most disentanglement methods are unsupervised…

Computation and Language · Computer Science 2023-02-17 Danilo S. Carvalho , Giangiacomo Mercatali , Yingji Zhang , Andre Freitas

This work presents a framework based on feature disentanglement to learn speaker embeddings that are robust to environmental variations. Our framework utilises an auto-encoder as a disentangler, dividing the input speaker embedding into…

Sound · Computer Science 2024-06-21 KiHyun Nam , Hee-Soo Heo , Jee-weon Jung , Joon Son Chung
‹ Prev 1 2 3 10 Next ›