Related papers: Segment Aggregation for short utterances speaker v…

Short utterance compensation in speaker verification via cosine-based teacher-student learning of speaker embeddings

The short duration of an input utterance is one of the most critical threats that degrade the performance of speaker verification systems. This study aimed to develop an integrated text-independent speaker verification system that inputs…

Audio and Speech Processing · Electrical Eng. & Systems 2019-04-11 Jee-weon Jung , Hee-soo Heo , Hye-jin Shim , Ha-jin Yu

Quality Measures for Speaker Verification with Short Utterances

The performances of the automatic speaker verification (ASV) systems degrade due to the reduction in the amount of speech used for enrollment and verification. Combining multiple systems based on different features and classifiers…

Computer Vision and Pattern Recognition · Computer Science 2019-02-01 Arnab Poddar , Md Sahidullah , Goutam Saha

Deep Segment Attentive Embedding for Duration Robust Speaker Verification

LSTM-based speaker verification usually uses a fixed-length local segment randomly truncated from an utterance to learn the utterance-level speaker embedding, while using the average embedding of all segments of a test utterance to verify…

Audio and Speech Processing · Electrical Eng. & Systems 2018-11-05 Bin Liu , Shuai Nie , Yaping Zhang , Shan Liang , Wenju Liu

PadAug: Robust Speaker Verification with Simple Waveform-Level Silence Padding

The presence of non-speech segments in utterances often leads to the performance degradation of speaker verification. Existing systems usually use voice activation detection as a preprocessing step to cut off long silence segments. However,…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-21 Zijun Huang , Chengdong Liang , Jiadi Yao , Xiao-Lei Zhang

Graph Attention Networks for Speaker Verification

This work presents a novel back-end framework for speaker verification using graph attention networks. Segment-wise speaker embeddings extracted from multiple crops within an utterance are interpreted as node representations of a graph. The…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-09 Jee-weon Jung , Hee-Soo Heo , Ha-Jin Yu , Joon Son Chung

Improving Multi-Scale Aggregation Using Feature Pyramid Module for Robust Speaker Verification of Variable-Duration Utterances

Currently, the most widely used approach for speaker verification is the deep speaker embedding learning. In this approach, we obtain a speaker embedding vector by pooling single-scale features that are extracted from the last layer of a…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-09 Youngmoon Jung , Seong Min Kye , Yeunju Choi , Myunghun Jung , Hoirin Kim

Improving Short Utterance PLDA Speaker Verification using SUV Modelling and Utterance Partitioning Approach

This paper analyses the short utterance probabilistic linear discriminant analysis (PLDA) speaker verification with utterance partitioning and short utterance variance (SUV) modelling approaches. Experimental studies have found that instead…

Sound · Computer Science 2016-10-18 Ahilan Kanagasundaram , David Dean , Sridha Sridharan , Clinton Fookes

Universal speaker recognition encoders for different speech segments duration

Creating universal speaker encoders which are robust for different acoustic and speech duration conditions is a big challenge today. According to our observations systems trained on short speech segments are optimal for short phrase speaker…

Sound · Computer Science 2022-10-31 Sergey Novoselov , Vladimir Volokhov , Galina Lavrentyeva

Self-Attentive Multi-Layer Aggregation with Feature Recalibration and Normalization for End-to-End Speaker Verification System

One of the most important parts of an end-to-end speaker verification system is the speaker embedding generation. In our previous paper, we reported that shortcut connections-based multi-layer aggregation improves the representational power…

Audio and Speech Processing · Electrical Eng. & Systems 2020-07-29 Soonshin Seo , Ji-Hwan Kim

VoiceExtender: Short-utterance Text-independent Speaker Verification with Guided Diffusion Model

Speaker verification (SV) performance deteriorates as utterances become shorter. To this end, we propose a new architecture called VoiceExtender which provides a promising solution for improving SV performance when handling short-duration…

Sound · Computer Science 2023-10-10 Yayun He , Zuheng Kang , Jianzong Wang , Junqing Peng , Jing Xiao

Novel Quality Metric for Duration Variability Compensation in Speaker Verification using i-Vectors

Automatic speaker verification (ASV) is the process to recognize persons using voice as biometric. The ASV systems show considerable recognition performance with sufficient amount of speech from matched condition. One of the crucial…

Multimedia · Computer Science 2018-12-04 Arnab Poddar , Md Sahidullah , Goutam Saha

RawNeXt: Speaker verification system for variable-duration utterances with deep layer aggregation and extended dynamic scaling policies

Despite achieving satisfactory performance in speaker verification using deep neural networks, variable-duration utterances remain a challenge that threatens the robustness of systems. To deal with this issue, we propose a speaker…

Audio and Speech Processing · Electrical Eng. & Systems 2022-06-28 Ju-ho Kim , Hye-jin Shim , Jungwoo Heo , Ha-Jin Yu

Weakly Supervised Training of Hierarchical Attention Networks for Speaker Identification

Identifying multiple speakers without knowing where a speaker's voice is in a recording is a challenging task. In this paper, a hierarchical attention network is proposed to solve a weakly labelled speaker identification problem. The use of…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-28 Yanpei Shi , Qiang Huang , Thomas Hain

Robust Speaker Recognition Using Speech Enhancement And Attention Model

In this paper, a novel architecture for speaker recognition is proposed by cascading speech enhancement and speaker processing. Its aim is to improve speaker recognition performance when speech signals are corrupted by noise. Instead of…

Computation and Language · Computer Science 2020-05-25 Yanpei Shi , Qiang Huang , Thomas Hain

Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs

In practical settings, a speaker recognition system needs to identify a speaker given a short utterance, while the enrollment utterance may be relatively long. However, existing speaker recognition models perform poorly with such short…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-12 Seong Min Kye , Youngmoon Jung , Hae Beom Lee , Sung Ju Hwang , Hoirin Kim

Lightweight Speaker Verification for Online Identification of New Speakers with Short Segments

Verifying if two audio segments belong to the same speaker has been recently put forward as a flexible way to carry out speaker identification, since it does not require to be re-trained when new speakers appear on the auditory scene.…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-26 Ivette Velez , Caleb Rascon , Gibran Fuentes-Pineda

ERes2NetV2: Boosting Short-Duration Speaker Verification Performance with Computational Efficiency

Speaker verification systems experience significant performance degradation when tasked with short-duration trial recordings. To address this challenge, a multi-scale feature fusion approach has been proposed to effectively capture speaker…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-05 Yafeng Chen , Siqi Zheng , Hui Wang , Luyao Cheng , Qian Chen , Shiliang Zhang , Junjie Li

Whisper-PMFA: Partial Multi-Scale Feature Aggregation for Speaker Verification using Whisper Models

In this paper, Whisper, a large-scale pre-trained model for automatic speech recognition, is proposed to apply to speaker verification. A partial multi-scale feature aggregation (PMFA) approach is proposed based on a subset of Whisper…

Sound · Computer Science 2024-08-29 Yiyang Zhao , Shuai Wang , Guangzhi Sun , Zehua Chen , Chao Zhang , Mingxing Xu , Thomas Fang Zheng

Length- and Noise-aware Training Techniques for Short-utterance Speaker Recognition

Speaker recognition performance has been greatly improved with the emergence of deep learning. Deep neural networks show the capacity to effectively deal with impacts of noise and reverberation, making them attractive to far-field speaker…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-28 Wenda Chen , Jonathan Huang , Tobias Bocklet

H-VECTORS: Utterance-level Speaker Embedding Using A Hierarchical Attention Model

In this paper, a hierarchical attention network to generate utterance-level embeddings (H-vectors) for speaker identification is proposed. Since different parts of an utterance may have different contributions to speaker identities, the use…

Computation and Language · Computer Science 2019-10-22 Yanpei Shi , Qiang Huang , Thomas Hain