English
Related papers

Related papers: Self-supervised Speaker Recognition Training Using…

200 papers

State-of-the-art speaker verification systems are inherently dependent on some kind of human supervision as they are trained on massive amounts of labeled data. However, manually annotating utterances is slow, expensive and not scalable to…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-25 Théo Lepage , Réda Dehak

This work explores how self-supervised learning can be universally used to discover speaker-specific features towards enabling personalized speech enhancement models. We specifically address the few-shot learning scenario where access to…

Audio and Speech Processing · Electrical Eng. & Systems 2022-08-11 Aswin Sivaraman , Minje Kim

Over the last few years, deep learning has grown in popularity for speaker verification, identification, and diarization. Inarguably, a significant part of this success is due to the demonstrated effectiveness of their speaker…

Sound · Computer Science 2022-10-07 Yehoshua Dissen , Felix Kreuk , Joseph Keshet

Self-supervised learning (SSL) based speech pre-training has attracted much attention for its capability of extracting rich representations learned from massive unlabeled data. On the other hand, the use of weakly-supervised data is less…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-30 Wangyou Zhang , Yanmin Qian

The goal of this work is to train robust speaker recognition models without speaker labels. Recent works on unsupervised speaker representations are based on contrastive learning in which they encourage within-utterance embeddings to be…

Sound · Computer Science 2020-11-02 Jaesung Huh , Hee Soo Heo , Jingu Kang , Shinji Watanabe , Joon Son Chung

In self-supervised learning for speaker recognition, pseudo labels are useful as the supervision signals. It is a known fact that a speaker recognition model doesn't always benefit from pseudo labels due to their unreliability. In this…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-15 Ruijie Tao , Kong Aik Lee , Rohan Kumar Das , Ville Hautamäki , Haizhou Li

Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and…

Existing studies on self-supervised speech representation learning have focused on developing new training methods and applying pre-trained models for different applications. However, the quality of these models is often measured by the…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-18 Alexander H. Liu , Sung-Lin Yeh , James Glass

Supervised learning for single-channel speech enhancement requires carefully labeled training examples where the noisy mixture is input into the network and the network is trained to produce an output close to the ideal target. To relax the…

Audio and Speech Processing · Electrical Eng. & Systems 2020-06-19 Yu-Che Wang , Shrikant Venkataramani , Paris Smaragdis

Speaker identity plays a significant role in human communication and is being increasingly used in societal applications, many through advances in machine learning. Speaker identity perception is an essential cognitive phenomenon that can…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-18 Gasser Elbanna

Recent studies have shown that the benefits provided by self-supervised pre-training and self-training (pseudo-labeling) are complementary. Semi-supervised fine-tuning strategies under the pre-training framework, however, remain…

Sound · Computer Science 2022-06-28 Bowen Zhang , Songjun Cao , Xiaoming Zhang , Yike Zhang , Long Ma , Takahiro Shinozaki

Neural network-based speaker recognition has achieved significant improvement in recent years. A robust speaker representation learns meaningful knowledge from both hard and easy samples in the training set to achieve good performance.…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-31 Ruijie Tao , Kong Aik Lee , Zhan Shi , Haizhou Li

This paper proposes a novel unsupervised autoregressive neural model for learning generic speech representations. In contrast to other speech representation learning methods that aim to remove noise or speaker variabilities, ours is…

Computation and Language · Computer Science 2019-06-20 Yu-An Chung , Wei-Ning Hsu , Hao Tang , James Glass

Multi-party dialogue machine reading comprehension (MRC) brings tremendous challenge since it involves multiple speakers at one dialogue, resulting in intricate speaker information flows and noisy dialogue contexts. To alleviate such…

Computation and Language · Computer Science 2021-09-17 Yiyang Li , Hai Zhao

We study a novel neural architecture and its training strategies of speaker encoder for speaker recognition without using any identity labels. The speaker encoder is trained to extract a fixed-size speaker embedding from a spoken utterance…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-28 Ruijie Tao , Kong Aik Lee , Rohan Kumar Das , Ville Hautamäki , Haizhou Li

Self-supervised learning (SSL) is a long-standing goal for speech processing, since it utilizes large-scale unlabeled data and avoids extensive human labeling. Recent years witness great successes in applying self-supervised learning in…

Computation and Language · Computer Science 2021-10-13 Sanyuan Chen , Yu Wu , Chengyi Wang , Zhengyang Chen , Zhuo Chen , Shujie Liu , Jian Wu , Yao Qian , Furu Wei , Jinyu Li , Xiangzhan Yu

This work presents self-supervised learning methods for developing monaural speaker-specific (i.e., personalized) speech enhancement models. While generalist models must broadly address many speakers, specialist models can adapt their…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-28 Aswin Sivaraman , Minje Kim

Training speaker-discriminative and robust speaker verification systems without speaker labels is still challenging and worthwhile to explore. In this study, we propose an effective self-supervised learning framework and a novel…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-03 Mufan Sang , Haoqi Li , Fang Liu , Andrew O. Arnold , Li Wan

In this paper, a novel architecture for speaker recognition is proposed by cascading speech enhancement and speaker processing. Its aim is to improve speaker recognition performance when speech signals are corrupted by noise. Instead of…

Computation and Language · Computer Science 2020-05-25 Yanpei Shi , Qiang Huang , Thomas Hain

In this paper, we explore the use of pre-trained language models to learn sentiment information of written texts for speech sentiment analysis. First, we investigate how useful a pre-trained language model would be in a 2-step pipeline…

Computation and Language · Computer Science 2021-06-15 Suwon Shon , Pablo Brusco , Jing Pan , Kyu J. Han , Shinji Watanabe
‹ Prev 1 2 3 10 Next ›