Related papers: Self-supervised Speaker Recognition Training Using…

Label-Efficient Self-Supervised Speaker Verification With Information Maximization and Contrastive Learning

State-of-the-art speaker verification systems are inherently dependent on some kind of human supervision as they are trained on massive amounts of labeled data. However, manually annotating utterances is slow, expensive and not scalable to…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-25 Théo Lepage , Réda Dehak

Self-Supervised Learning from Contrastive Mixtures for Personalized Speech Enhancement

This work explores how self-supervised learning can be universally used to discover speaker-specific features towards enabling personalized speech enhancement models. We specifically address the few-shot learning scenario where access to…

Audio and Speech Processing · Electrical Eng. & Systems 2022-08-11 Aswin Sivaraman , Minje Kim

Self-supervised Speaker Diarization

Over the last few years, deep learning has grown in popularity for speaker verification, identification, and diarization. Inarguably, a significant part of this success is due to the demonstrated effectiveness of their speaker…

Sound · Computer Science 2022-10-07 Yehoshua Dissen , Felix Kreuk , Joseph Keshet

Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition

Self-supervised learning (SSL) based speech pre-training has attracted much attention for its capability of extracting rich representations learned from massive unlabeled data. On the other hand, the use of weakly-supervised data is less…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-30 Wangyou Zhang , Yanmin Qian

Augmentation adversarial training for self-supervised speaker recognition

The goal of this work is to train robust speaker recognition models without speaker labels. Recent works on unsupervised speaker representations are based on contrastive learning in which they encourage within-utterance embeddings to be…

Sound · Computer Science 2020-11-02 Jaesung Huh , Hee Soo Heo , Jingu Kang , Shinji Watanabe , Joon Son Chung

Self-supervised Speaker Recognition with Loss-gated Learning

In self-supervised learning for speaker recognition, pseudo labels are useful as the supervision signals. It is a known fact that a speaker recognition model doesn't always benefit from pseudo labels due to their unreliability. In this…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-15 Ruijie Tao , Kong Aik Lee , Rohan Kumar Das , Ville Hautamäki , Haizhou Li

Self-Supervised Speech Representation Learning: A Review

Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and…

Computation and Language · Computer Science 2022-11-23 Abdelrahman Mohamed , Hung-yi Lee , Lasse Borgholt , Jakob D. Havtorn , Joakim Edin , Christian Igel , Katrin Kirchhoff , Shang-Wen Li , Karen Livescu , Lars Maaløe , Tara N. Sainath , Shinji Watanabe

Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective

Existing studies on self-supervised speech representation learning have focused on developing new training methods and applying pre-trained models for different applications. However, the quality of these models is often measured by the…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-18 Alexander H. Liu , Sung-Lin Yeh , James Glass

Self-supervised Learning for Speech Enhancement

Supervised learning for single-channel speech enhancement requires carefully labeled training examples where the noisy mixture is input into the network and the network is trained to produce an output close to the ideal target. To relax the…

Audio and Speech Processing · Electrical Eng. & Systems 2020-06-19 Yu-Che Wang , Shrikant Venkataramani , Paris Smaragdis

Evaluating Speaker Identity Coding in Self-supervised Models and Humans

Speaker identity plays a significant role in human communication and is being increasingly used in societal applications, many through advances in machine learning. Speaker identity perception is an essential cognitive phenomenon that can…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-18 Gasser Elbanna

Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training

Recent studies have shown that the benefits provided by self-supervised pre-training and self-training (pseudo-labeling) are complementary. Semi-supervised fine-tuning strategies under the pre-training framework, however, remain…

Sound · Computer Science 2022-06-28 Bowen Zhang , Songjun Cao , Xiaoming Zhang , Yike Zhang , Long Ma , Takahiro Shinozaki

Speaker recognition with two-step multi-modal deep cleansing

Neural network-based speaker recognition has achieved significant improvement in recent years. A robust speaker representation learns meaningful knowledge from both hard and easy samples in the training set to achieve good performance.…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-31 Ruijie Tao , Kong Aik Lee , Zhan Shi , Haizhou Li

An Unsupervised Autoregressive Model for Speech Representation Learning

This paper proposes a novel unsupervised autoregressive neural model for learning generic speech representations. In contrast to other speech representation learning methods that aim to remove noise or speaker variabilities, ours is…

Computation and Language · Computer Science 2019-06-20 Yu-An Chung , Wei-Ning Hsu , Hao Tang , James Glass

Self- and Pseudo-self-supervised Prediction of Speaker and Key-utterance for Multi-party Dialogue Reading Comprehension

Multi-party dialogue machine reading comprehension (MRC) brings tremendous challenge since it involves multiple speakers at one dialogue, resulting in intricate speaker information flows and noisy dialogue contexts. To alleviate such…

Computation and Language · Computer Science 2021-09-17 Yiyang Li , Hai Zhao

Self-Supervised Training of Speaker Encoder with Multi-Modal Diverse Positive Pairs

We study a novel neural architecture and its training strategies of speaker encoder for speaker recognition without using any identity labels. The speaker encoder is trained to extract a fixed-size speaker embedding from a spoken utterance…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-28 Ruijie Tao , Kong Aik Lee , Rohan Kumar Das , Ville Hautamäki , Haizhou Li

UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training

Self-supervised learning (SSL) is a long-standing goal for speech processing, since it utilizes large-scale unlabeled data and avoids extensive human labeling. Recent years witness great successes in applying self-supervised learning in…

Computation and Language · Computer Science 2021-10-13 Sanyuan Chen , Yu Wu , Chengyi Wang , Zhengyang Chen , Zhuo Chen , Shujie Liu , Jian Wu , Yao Qian , Furu Wei , Jinyu Li , Xiangzhan Yu

Efficient Personalized Speech Enhancement through Self-Supervised Learning

This work presents self-supervised learning methods for developing monaural speaker-specific (i.e., personalized) speech enhancement models. While generalist models must broadly address many speakers, specialist models can adapt their…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-28 Aswin Sivaraman , Minje Kim

Self-Supervised Speaker Verification with Simple Siamese Network and Self-Supervised Regularization

Training speaker-discriminative and robust speaker verification systems without speaker labels is still challenging and worthwhile to explore. In this study, we propose an effective self-supervised learning framework and a novel…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-03 Mufan Sang , Haoqi Li , Fang Liu , Andrew O. Arnold , Li Wan

Robust Speaker Recognition Using Speech Enhancement And Attention Model

In this paper, a novel architecture for speaker recognition is proposed by cascading speech enhancement and speaker processing. Its aim is to improve speaker recognition performance when speech signals are corrupted by noise. Instead of…

Computation and Language · Computer Science 2020-05-25 Yanpei Shi , Qiang Huang , Thomas Hain

Leveraging Pre-trained Language Model for Speech Sentiment Analysis

In this paper, we explore the use of pre-trained language models to learn sentiment information of written texts for speech sentiment analysis. First, we investigate how useful a pre-trained language model would be in a 2-step pipeline…

Computation and Language · Computer Science 2021-06-15 Suwon Shon , Pablo Brusco , Jing Pan , Kyu J. Han , Shinji Watanabe