English
Related papers

Related papers: Computing with Hypervectors for Efficient Speaker …

200 papers

The objective of this paper is speaker recognition under noisy and unconstrained conditions. We make two key contributions. First, we introduce a very large-scale audio-visual speaker recognition dataset collected from open-source media.…

Sound · Computer Science 2020-11-05 Joon Son Chung , Arsha Nagrani , Andrew Zisserman

This work considers training neural networks for speaker recognition with a much smaller dataset size compared to contemporary work. We artificially restrict the amount of data by proposing three subsets of the popular VoxCeleb2 dataset.…

Sound · Computer Science 2023-02-28 Nik Vaessen , David A. van Leeuwen

This paper improves the speaker recognition rates of a MLP classifier and LPCC codebook alone, using a linear combination between both methods. In simulations we have obtained an improvement of 4.7% over a LPCC codebook of 32 vectors and…

Sound · Computer Science 2022-03-23 Daniel Rodriguez-Porcheron , Marcos Faundez-Zanuy

We propose an approach for training speaker identification models in a weakly supervised manner. We concentrate on the setting where the training data consists of a set of audio recordings and the speaker annotation is provided only at the…

Sound · Computer Science 2018-06-25 Martin Karu , Tanel Alumäe

Wav2vec2 has achieved success in applying Transformer architecture and self-supervised learning to speech recognition. Recently, these have come to be used not only for speech recognition but also for the entire speech processing. This…

Sound · Computer Science 2023-09-12 Harunori Kawano , Sota Shimizu

This work presents a novel framework based on feed-forward neural network for text-independent speaker classification and verification, two related systems of speaker recognition. With optimized features and model training, it achieves 100%…

Sound · Computer Science 2017-03-20 Zhenhao Ge , Ananth N. Iyer , Srinath Cheluvaraja , Ram Sundaram , Aravind Ganapathiraju

The mechanism proposed here is for real-time speaker change detection in conversations, which firstly trains a neural network text-independent speaker classifier using in-domain speaker data. Through the network, features of conversational…

Sound · Computer Science 2017-03-20 Zhenhao Ge , Ananth N. Iyer , Srinath Cheluvaraja , Aravind Ganapathiraju

While the use of deep neural networks has significantly boosted speaker recognition performance, it is still challenging to separate speakers in poor acoustic environments. Here speech enhancement methods have traditionally allowed improved…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-28 Yanpei Shi , Qiang Huang , Thomas Hain

Speaker counting is the task of estimating the number of people that are simultaneously speaking in an audio recording. For several audio processing tasks such as speaker diarization, separation, localization and tracking, knowing the…

Sound · Computer Science 2020-03-18 Pierre-Amaury Grumiaux , Srdjan Kitic , Laurent Girin , Alexandre Guérin

Speaker verification is the process by which a speakers claim of identity is tested against a claimed speaker by his or her voice. Speaker verification is done by the use of some parameters (features) from the speakers voice which can be…

Sound · Computer Science 2019-08-16 Bhavana V. S , Pradip K. Das

We study the individuality of the human voice with respect to a widely used feature representation of speech utterances, namely, the i-vector model. As a first step toward this goal, we compare and contrast uniqueness measures proposed for…

Audio and Speech Processing · Electrical Eng. & Systems 2021-03-04 Erkam Sinan Tandogan , Husrev Taha Sencar

Speaker counting is the task of estimating the number of people that are simultaneously speaking in an audio recording. For several audio processing tasks such as speaker diarization, separation, localization and tracking, knowing the…

Sound · Computer Science 2021-01-07 Pierre-Amaury Grumiaux , Srdan Kitic , Laurent Girin , Alexandre Guérin

Identifying multiple speakers without knowing where a speaker's voice is in a recording is a challenging task. This paper proposes a hierarchical network with transformer encoders and memory mechanism to address this problem. The proposed…

Sound · Computer Science 2020-11-02 Yanpei Shi , Mingjie Chen , Qiang Huang , Thomas Hain

We propose SpeakerNet - a new neural architecture for speaker recognition and speaker verification tasks. It is composed of residual blocks with 1D depth-wise separable convolutions, batch-normalization, and ReLU layers. This architecture…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-27 Nithin Rao Koluguri , Jason Li , Vitaly Lavrukhin , Boris Ginsburg

Voice recognition and speaker identification are vital for applications in security and personal assistants. This paper presents a lightweight 1D-Convolutional Neural Network (1D-CNN) designed to perform speaker identification on minimal…

Sound · Computer Science 2024-11-25 Irfan Nafiz Shahan , Pulok Ahmed Auvi

Recently, fine-tuning large pre-trained Transformer models using downstream datasets has received a rising interest. Despite their success, it is still challenging to disentangle the benefits of large-scale datasets and Transformer…

Audio and Speech Processing · Electrical Eng. & Systems 2023-05-19 Junyi Peng , Oldřich Plchot , Themos Stafylakis , Ladislav Mošner , Lukáš Burget , Jan Černocký

In this paper, we propose an innovative approach to perform speaker recognition by fusing two recently introduced deep neural networks (DNNs) namely - SincNet and X-Vector. The idea behind using SincNet filters on the raw speech waveform is…

Computation and Language · Computer Science 2020-04-07 Mayank Tripathi , Divyanshu Singh , Seba Susan

This paper describes an effective unsupervised speaker indexing approach. We suggest a two stage algorithm to speed-up the state-of-the-art algorithm based on the Bayesian Information Criterion (BIC). In the first stage of the merging…

Sound · Computer Science 2010-09-27 Konstantin Biatov

Recent advances in unsupervised speech representation learning discover new approaches and provide new state-of-the-art for diverse types of speech processing tasks. This paper presents an investigation of using wav2vec 2.0 deep speech…

Speaker verification (SV) utilizing features obtained from models pre-trained via self-supervised learning has recently demonstrated impressive performances. However, these pre-trained models (PTMs) usually have a temporal resolution of 20…

Audio and Speech Processing · Electrical Eng. & Systems 2026-01-28 Jisoo Myoung , Sangwook Han , Kihyuk Kim , Jong Won Shin
‹ Prev 1 2 3 10 Next ›