Related papers: Guided Speaker Embedding

Mitigating Non-Target Speaker Bias in Guided Speaker Embedding

Obtaining high-quality speaker embeddings in multi-speaker conditions is crucial for many applications. A recently proposed guided speaker embedding framework, which utilizes speech activities of target and non-target speakers as clues,…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-17 Shota Horiguchi , Takanori Ashihara , Marc Delcroix , Atsushi Ando , Naohiro Tawara

Recursive Attentive Pooling for Extracting Speaker Embeddings from Multi-Speaker Recordings

This paper proposes a method for extracting speaker embedding for each speaker from a variable-length recording containing multiple speakers. Speaker embeddings are crucial not only for speaker recognition but also for various multi-speaker…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-02 Shota Horiguchi , Atsushi Ando , Takafumi Moriya , Takanori Ashihara , Hiroshi Sato , Naohiro Tawara , Marc Delcroix

Target Speaker Extraction for Overlapped Multi-Talker Speaker Verification

The performance of speaker verification degrades significantly when the test speech is corrupted by interference speakers. Speaker diarization does well to separate speakers if the speakers are temporally overlapped. However, if…

Audio and Speech Processing · Electrical Eng. & Systems 2019-02-08 Wei Rao , Chenglin Xu , Eng Siong Chng , Haizhou Li

ImagineNET: Target Speaker Extraction with Intermittent Visual Cue through Embedding Inpainting

The speaker extraction technique seeks to single out the voice of a target speaker from the interfering voices in a speech mixture. Typically an auxiliary reference of the target speaker is used to form voluntary attention. Either a…

Audio and Speech Processing · Electrical Eng. & Systems 2023-03-10 Zexu Pan , Wupeng Wang , Marvin Borsdorf , Haizhou Li

Leveraging Speaker Embeddings in End-to-End Neural Diarization for Two-Speaker Scenarios

End-to-end neural speaker diarization systems are able to address the speaker diarization task while effectively handling speech overlap. This work explores the incorporation of speaker information embeddings into the end-to-end systems to…

Sound · Computer Science 2024-07-02 Juan Ignacio Alvarez-Trejos , Beltrán Labrador , Alicia Lozano-Diez

Multi-scale speaker embedding-based graph attention networks for speaker diarisation

The objective of this work is effective speaker diarisation using multi-scale speaker embeddings. Typically, there is a trade-off between the ability to recognise short speaker segments and the discriminative power of the embedding,…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-11 Youngki Kwon , Hee-Soo Heo , Jee-weon Jung , You Jin Kim , Bong-Jin Lee , Joon Son Chung

Robust Target Speaker Diarization and Separation via Augmented Speaker Embedding Sampling

Traditional speech separation and speaker diarization approaches rely on prior knowledge of target speakers or a predetermined number of participants in audio signals. To address these limitations, recent advances focus on developing…

Sound · Computer Science 2025-08-11 Md Asif Jalal , Luca Remaggi , Vasileios Moschopoulos , Thanasis Kotsiopoulos , Vandana Rajan , Karthikeyan Saravanan , Anastasis Drosou , Junho Heo , Hyuk Oh , Seokyeong Jeong

Compositional embedding models for speaker identification and diarization with simultaneous speech from 2+ speakers

We propose a new method for speaker diarization that can handle overlapping speech with 2+ people. Our method is based on compositional embeddings [1]: Like standard speaker embedding methods such as x-vector [2], compositional embedding…

Sound · Computer Science 2021-02-11 Zeqian Li , Jacob Whitehill

Informed Source Extraction With Application to Acoustic Echo Reduction

Informed speaker extraction aims to extract a target speech signal from a mixture of sources given prior knowledge about the desired speaker. Recent deep learning-based methods leverage a speaker discriminative model that maps a reference…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-17 Mohamed Elminshawi , Wolfgang Mack , Emanuël A. P. Habets

DNN Speaker Tracking with Embeddings

In multi-speaker applications is common to have pre-computed models from enrolled speakers. Using these models to identify the instances in which these speakers intervene in a recording is the task of speaker tracking. In this paper, we…

Sound · Computer Science 2020-07-21 Carlos Rodrigo Castillo-Sanchez , Leibny Paola Garcia-Perera , Anabel Martin-Gonzalez

Adapting Speaker Embeddings for Speaker Diarisation

The goal of this paper is to adapt speaker embeddings for solving the problem of speaker diarisation. The quality of speaker embeddings is paramount to the performance of speaker diarisation systems. Despite this, prior works in the field…

Audio and Speech Processing · Electrical Eng. & Systems 2021-04-08 Youngki Kwon , Jee-weon Jung , Hee-Soo Heo , You Jin Kim , Bong-Jin Lee , Joon Son Chung

A Teacher-Student approach for extracting informative speaker embeddings from speech mixtures

We introduce a monaural neural speaker embeddings extractor that computes an embedding for each speaker present in a speech mixture. To allow for supervised training, a teacher-student approach is employed: the teacher computes the target…

Audio and Speech Processing · Electrical Eng. & Systems 2023-09-20 Tobias Cord-Landwehr , Christoph Boeddeker , Cătălin Zorilă , Rama Doddipatla , Reinhold Haeb-Umbach

End-to-end Online Speaker Diarization with Target Speaker Tracking

This paper proposes an online target speaker voice activity detection system for speaker diarization tasks, which does not require a priori knowledge from the clustering-based diarization system to obtain the target speaker embeddings. By…

Audio and Speech Processing · Electrical Eng. & Systems 2023-10-16 Weiqing Wang , Ming Li

Multi-stage Speaker Extraction with Utterance and Frame-Level Reference Signals

Speaker extraction requires a sample speech from the target speaker as the reference. However, enrolling a speaker with a long speech is not practical. We propose a speaker extraction technique, that performs in multiple stages to take full…

Audio and Speech Processing · Electrical Eng. & Systems 2021-04-05 Meng Ge , Chenglin Xu , Longbiao Wang , Eng Siong Chng , Jianwu Dang , Haizhou Li

In search of strong embedding extractors for speaker diarisation

Speaker embedding extractors (EEs), which map input audio to a speaker discriminant latent space, are of paramount importance in speaker diarisation. However, there are several challenges when adopting EEs for diarisation, from which we…

Sound · Computer Science 2022-10-27 Jee-weon Jung , Hee-Soo Heo , Bong-Jin Lee , Jaesung Huh , Andrew Brown , Youngki Kwon , Shinji Watanabe , Joon Son Chung

Speaker Embeddings With Weakly Supervised Voice Activity Detection For Efficient Speaker Diarization

Current speaker diarization systems rely on an external voice activity detection model prior to speaker embedding extraction on the detected speech segments. In this paper, we establish that the attention system of a speaker embedding…

Audio and Speech Processing · Electrical Eng. & Systems 2024-05-16 Jenthe Thienpondt , Kris Demuynck

Speaker-independent Speech Separation with Deep Attractor Network

Despite the recent success of deep learning for many speech processing tasks, single-microphone, speaker-independent speech separation remains challenging for two main reasons. The first reason is the arbitrary order of the target and…

Sound · Computer Science 2018-04-19 Yi Luo , Zhuo Chen , Nima Mesgarani

Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches

Recently, end-to-end speaker extraction has attracted increasing attention and shown promising results. However, its performance is often inferior to that of a blind source separation (BSS) counterpart with a similar network architecture,…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-05 Zifeng Zhao , Dongchao Yang , Rongzhi Gu , Haoran Zhang , Yuexian Zou

Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-talker Speech

Target speaker extraction aims to extract the speech of a specific speaker from a multi-talker mixture as specified by an auxiliary reference. Most studies focus on the scenario where the target speech is highly overlapped with the…

Sound · Computer Science 2023-09-18 Junjie Li , Ruijie Tao , Zexu Pan , Meng Ge , Shuai Wang , Haizhou Li

Advancing the dimensionality reduction of speaker embeddings for speaker diarisation: disentangling noise and informing speech activity

The objective of this work is to train noise-robust speaker embeddings adapted for speaker diarisation. Speaker embeddings play a crucial role in the performance of diarisation systems, but they often capture spurious information such as…

Sound · Computer Science 2022-11-04 You Jin Kim , Hee-Soo Heo , Jee-weon Jung , Youngki Kwon , Bong-Jin Lee , Joon Son Chung