Related papers: Computing with Hypervectors for Efficient Speaker …

VoxCeleb2: Deep Speaker Recognition

The objective of this paper is speaker recognition under noisy and unconstrained conditions. We make two key contributions. First, we introduce a very large-scale audio-visual speaker recognition dataset collected from open-source media.…

Sound · Computer Science 2020-11-05 Joon Son Chung , Arsha Nagrani , Andrew Zisserman

Training speaker recognition systems with limited data

This work considers training neural networks for speaker recognition with a much smaller dataset size compared to contemporary work. We artificially restrict the amount of data by proposing three subsets of the popular VoxCeleb2 dataset.…

Sound · Computer Science 2023-02-28 Nik Vaessen , David A. van Leeuwen

Speaker recognition with a MLP classifier and LPCC codebook

This paper improves the speaker recognition rates of a MLP classifier and LPCC codebook alone, using a linear combination between both methods. In simulations we have obtained an improvement of 4.7% over a LPCC codebook of 32 vectors and…

Sound · Computer Science 2022-03-23 Daniel Rodriguez-Porcheron , Marcos Faundez-Zanuy

Weakly Supervised Training of Speaker Identification Models

We propose an approach for training speaker identification models in a weakly supervised manner. We concentrate on the setting where the training data consists of a set of audio recordings and the speaker annotation is provided only at the…

Sound · Computer Science 2018-06-25 Martin Karu , Tanel Alumäe

An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification

Wav2vec2 has achieved success in applying Transformer architecture and self-supervised learning to speech recognition. Recently, these have come to be used not only for speech recognition but also for the entire speech processing. This…

Sound · Computer Science 2023-09-12 Harunori Kawano , Sota Shimizu

Neural Network Based Speaker Classification and Verification Systems with Enhanced Features

This work presents a novel framework based on feed-forward neural network for text-independent speaker classification and verification, two related systems of speaker recognition. With optimized features and model training, it achieves 100%…

Sound · Computer Science 2017-03-20 Zhenhao Ge , Ananth N. Iyer , Srinath Cheluvaraja , Ram Sundaram , Aravind Ganapathiraju

Speaker Change Detection Using Features through A Neural Network Speaker Classifier

The mechanism proposed here is for real-time speaker change detection in conversations, which firstly trains a neural network text-independent speaker classifier using in-domain speaker data. Through the network, features of conversational…

Sound · Computer Science 2017-03-20 Zhenhao Ge , Ananth N. Iyer , Srinath Cheluvaraja , Aravind Ganapathiraju

Speaker Re-identification with Speaker Dependent Speech Enhancement

While the use of deep neural networks has significantly boosted speaker recognition performance, it is still challenging to separate speakers in poor acoustic environments. Here speech enhancement methods have traditionally allowed improved…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-28 Yanpei Shi , Qiang Huang , Thomas Hain

High-Resolution Speaker Counting In Reverberant Rooms Using CRNN With Ambisonics Features

Speaker counting is the task of estimating the number of people that are simultaneously speaking in an audio recording. For several audio processing tasks such as speaker diarization, separation, localization and tracking, knowing the…

Sound · Computer Science 2020-03-18 Pierre-Amaury Grumiaux , Srdjan Kitic , Laurent Girin , Alexandre Guérin

Speaker Verification Using Simple Temporal Features and Pitch Synchronous Cepstral Coefficients

Speaker verification is the process by which a speakers claim of identity is tested against a claimed speaker by his or her voice. Speaker verification is done by the use of some parameters (features) from the speakers voice which can be…

Sound · Computer Science 2019-08-16 Bhavana V. S , Pradip K. Das

Estimating Uniqueness of I-Vector Representation of Human Voice

We study the individuality of the human voice with respect to a widely used feature representation of speech utterances, namely, the i-vector model. As a first step toward this goal, we compare and contrast uniqueness measures proposed for…

Audio and Speech Processing · Electrical Eng. & Systems 2021-03-04 Erkam Sinan Tandogan , Husrev Taha Sencar

Multichannel CRNN for Speaker Counting: an Analysis of Performance

Speaker counting is the task of estimating the number of people that are simultaneously speaking in an audio recording. For several audio processing tasks such as speaker diarization, separation, localization and tracking, knowing the…

Sound · Computer Science 2021-01-07 Pierre-Amaury Grumiaux , Srdan Kitic , Laurent Girin , Alexandre Guérin

T-vectors: Weakly Supervised Speaker Identification Using Hierarchical Transformer Model

Identifying multiple speakers without knowing where a speaker's voice is in a recording is a challenging task. This paper proposes a hierarchical network with transformer encoders and memory mechanism to address this problem. The proposed…

Sound · Computer Science 2020-11-02 Yanpei Shi , Mingjie Chen , Qiang Huang , Thomas Hain

SpeakerNet: 1D Depth-wise Separable Convolutional Network for Text-Independent Speaker Recognition and Verification

We propose SpeakerNet - a new neural architecture for speaker recognition and speaker verification tasks. It is composed of residual blocks with 1D depth-wise separable convolutions, batch-normalization, and ReLU layers. This architecture…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-27 Nithin Rao Koluguri , Jason Li , Vitaly Lavrukhin , Boris Ginsburg

Towards Speaker Identification with Minimal Dataset and Constrained Resources using 1D-Convolution Neural Network

Voice recognition and speaker identification are vital for applications in security and personal assistants. This paper presents a lightweight 1D-Convolutional Neural Network (1D-CNN) designed to perform speaker identification on minimal…

Sound · Computer Science 2024-11-25 Irfan Nafiz Shahan , Pulok Ahmed Auvi

Improving Speaker Verification with Self-Pretrained Transformer Models

Recently, fine-tuning large pre-trained Transformer models using downstream datasets has received a rising interest. Despite their success, it is still challenging to disentangle the benefits of large-scale datasets and Transformer…

Audio and Speech Processing · Electrical Eng. & Systems 2023-05-19 Junyi Peng , Oldřich Plchot , Themos Stafylakis , Ladislav Mošner , Lukáš Burget , Jan Černocký

Speaker Recognition using SincNet and X-Vector Fusion

In this paper, we propose an innovative approach to perform speaker recognition by fusing two recently introduced deep neural networks (DNNs) namely - SincNet and X-Vector. The idea behind using SincNet filters on the raw speech waveform is…

Computation and Language · Computer Science 2020-04-07 Mayank Tripathi , Divyanshu Singh , Seba Susan

A Fast Audio Clustering Using Vector Quantization and Second Order Statistics

This paper describes an effective unsupervised speaker indexing approach. We suggest a two stage algorithm to speed-up the state-of-the-art algorithm based on the Bayesian Information Criterion (BIC). In the first stage of the merging…

Sound · Computer Science 2010-09-27 Konstantin Biatov

Robust Speaker Recognition with Transformers Using wav2vec 2.0

Recent advances in unsupervised speech representation learning discover new approaches and provide new state-of-the-art for diverse types of speech processing tasks. This paper presents an investigation of using wav2vec 2.0 deep speech…

Sound · Computer Science 2022-03-30 Sergey Novoselov , Galina Lavrentyeva , Anastasia Avdeeva , Vladimir Volokhov , Aleksei Gusev

Short-Segment Speaker Verification with Pre-trained Models and Multi-Resolution Encoder

Speaker verification (SV) utilizing features obtained from models pre-trained via self-supervised learning has recently demonstrated impressive performances. However, these pre-trained models (PTMs) usually have a temporal resolution of 20…

Audio and Speech Processing · Electrical Eng. & Systems 2026-01-28 Jisoo Myoung , Sangwook Han , Kihyuk Kim , Jong Won Shin