Related papers: Visualizing data augmentation in deep speaker reco…

Adversarial Data Augmentation for Robust Speaker Verification

Data augmentation (DA) has gained widespread popularity in deep speaker models due to its ease of implementation and significant effectiveness. It enriches training data by simulating real-life acoustic variations, enabling deep neural…

Sound · Computer Science 2024-02-07 Zhenyu Zhou , Junhui Chen , Namin Wang , Lantian Li , Dong Wang

A Comprehensive Investigation on Speaker Augmentation for Speaker Recognition

Data augmentation (DA) has played a pivotal role in the success of deep speaker recognition. Current DA techniques primarily focus on speaker-preserving augmentation, which does not change the speaker trait of the speech and does not create…

Sound · Computer Science 2024-06-12 Zhenyu Zhou , Shibiao Xu , Shi Yin , Lantian Li , Dong Wang

Reliable Visualization for Deep Speaker Recognition

In spite of the impressive success of convolutional neural networks (CNNs) in speaker recognition, our understanding to CNNs' internal functions is still limited. A major obstacle is that some popular visualization tools are difficult to…

Sound · Computer Science 2022-04-13 Pengqi Li , Lantian Li , Askar Hamdulla , Dong Wang

DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification

Data augmentation is vital to the generalization ability and robustness of deep neural networks (DNNs) models. Existing augmentation methods for speaker verification manipulate the raw signal, which are time-consuming and the augmented…

Audio and Speech Processing · Electrical Eng. & Systems 2023-10-19 Yuanyuan Wang , Yang Zhang , Zhiyong Wu , Zhihan Yang , Tao Wei , Kun Zou , Helen Meng

Data augmentation versus noise compensation for x- vector speaker recognition systems in noisy environments

The explosion of available speech data and new speaker modeling methods based on deep neural networks (DNN) have given the ability to develop more robust speaker recognition systems. Among DNN speaker modelling techniques, x-vector system…

Sound · Computer Science 2020-06-30 Mohammad Mohammadamini , Driss Matrouf

LightCAM: A Fast and Light Implementation of Context-Aware Masking based D-TDNN for Speaker Verification

Traditional Time Delay Neural Networks (TDNN) have achieved state-of-the-art performance at the cost of high computational complexity and slower inference speed, making them difficult to implement in an industrial environment. The Densely…

Computation and Language · Computer Science 2024-02-13 Di Cao , Xianchen Wang , Junfeng Zhou , Jiakai Zhang , Yanjing Lei , Wenpeng Chen

Leveraging speaker attribute information using multi task learning for speaker verification and diarization

Deep speaker embeddings have become the leading method for encoding speaker identity in speaker recognition tasks. The embedding space should ideally capture the variations between all possible speakers, encoding the multiple acoustic…

Sound · Computer Science 2021-04-26 Chau Luu , Peter Bell , Steve Renals

Learning Better with Less: Effective Augmentation for Sample-Efficient Visual Reinforcement Learning

Data augmentation (DA) is a crucial technique for enhancing the sample efficiency of visual reinforcement learning (RL) algorithms. Notably, employing simple observation transformations alone can yield outstanding performance without extra…

Machine Learning · Computer Science 2023-10-30 Guozheng Ma , Linrui Zhang , Haoyu Wang , Lu Li , Zilin Wang , Zhen Wang , Li Shen , Xueqian Wang , Dacheng Tao

Analysis of Deep Feature Loss based Enhancement for Speaker Verification

Data augmentation is conventionally used to inject robustness in Speaker Verification systems. Several recently organized challenges focus on handling novel acoustic environments. Deep learning based speech enhancement is a modern solution…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-29 Saurabh Kataria , Phani Sankar Nidadavolu , Jesús Villalba , Najim Dehak

Robust Speaker Recognition Using Speech Enhancement And Attention Model

In this paper, a novel architecture for speaker recognition is proposed by cascading speech enhancement and speaker processing. Its aim is to improve speaker recognition performance when speech signals are corrupted by noise. Instead of…

Computation and Language · Computer Science 2020-05-25 Yanpei Shi , Qiang Huang , Thomas Hain

Attacking Voice Anonymization Systems with Augmented Feature and Speaker Identity Difference

This study focuses on the First VoicePrivacy Attacker Challenge within the ICASSP 2025 Signal Processing Grand Challenge, which aims to develop speaker verification systems capable of determining whether two anonymized speech signals are…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-14 Yanzhe Zhang , Zhonghao Bi , Feiyang Xiao , Xuefeng Yang , Qiaoxi Zhu , Jian Guan

On the Impact of Interpretability Methods in Active Image Augmentation Method

Robustness is a significant constraint in machine learning models. The performance of the algorithms must not deteriorate when training and testing with slightly different data. Deep neural network models achieve awe-inspiring results in a…

Computer Vision and Pattern Recognition · Computer Science 2021-02-25 Flavio Santos , Cleber Zanchettin , Leonardo Matos , Paulo Novais

Disentangled speaker and nuisance attribute embedding for robust speaker verification

Over the recent years, various deep learning-based embedding methods have been proposed and have shown impressive performance in speaker verification. However, as in most of the classical embedding techniques, the deep learning-based…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-10 Woo Hyun Kang , Sung Hwan Mun , Min Hyun Han , Nam Soo Kim

DSARSR: Deep Stacked Auto-encoders Enhanced Robust Speaker Recognition

Speaker recognition is a biometric modality that utilizes the speaker's speech segments to recognize the identity, determining whether the test speaker belongs to one of the enrolled speakers. In order to improve the robustness of the…

Sound · Computer Science 2023-07-07 Zhifeng Wang , Chunyan Zeng , Surong Duan , Hongjie Ouyang , Hongmin Xu

Virtual Data Augmentation: A Robust and General Framework for Fine-tuning Pre-trained Models

Recent works have shown that powerful pre-trained language models (PLM) can be fooled by small perturbations or intentional attacks. To solve this issue, various data augmentation techniques are proposed to improve the robustness of PLMs.…

Computation and Language · Computer Science 2021-09-14 Kun Zhou , Wayne Xin Zhao , Sirui Wang , Fuzheng Zhang , Wei Wu , Ji-Rong Wen

Robust Training for Speaker Verification against Noisy Labels

The deep learning models used for speaker verification rely heavily on large amounts of data and correct labeling. However, noisy (incorrect) labels often occur, which degrades the performance of the system. In this paper, we propose a…

Sound · Computer Science 2026-04-29 Zhihua Fang , Liang He , Hanhan Ma , Xiaochen Guo , Lin Li

Large-scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification

The speech representations learned from large-scale unlabeled data have shown better generalizability than those from supervised learning and thus attract a lot of interest to be applied for various downstream tasks. In this paper, we…

Sound · Computer Science 2022-01-25 Zhengyang Chen , Sanyuan Chen , Yu Wu , Yao Qian , Chengyi Wang , Shujie Liu , Yanmin Qian , Michael Zeng

TextCAM: Explaining Class Activation Map with Text

Deep neural networks (DNNs) have achieved remarkable success across domains but remain difficult to interpret, limiting their trustworthiness in high-stakes applications. This paper focuses on deep vision models, for which a dominant line…

Computer Vision and Pattern Recognition · Computer Science 2025-10-02 Qiming Zhao , Xingjian Li , Xiaoyu Cao , Xiaolong Wu , Min Xu

Visual Speech Enhancement

When video is shot in noisy environment, the voice of a speaker seen in the video can be enhanced using the visible mouth movements, reducing background noise. While most existing methods use audio-only inputs, improved performance is…

Computer Vision and Pattern Recognition · Computer Science 2018-06-14 Aviv Gabbay , Asaph Shamir , Shmuel Peleg

Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models

We propose a conceptually simple and lightweight framework for improving the robustness of vision models through the combination of knowledge distillation and data augmentation. We address the conjecture that larger models do not make for…

Machine Learning · Computer Science 2024-02-06 Andy Zhou , Jindong Wang , Yu-Xiong Wang , Haohan Wang