English
Related papers

Related papers: Y-Vector: Multiscale Waveform Encoder for Speaker …

200 papers

In this paper, we propose an online speaker adaptation method for WaveNet-based neural vocoders in order to improve their performance on speaker-independent waveform generation. In this method, a speaker encoder is first constructed using a…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-17 Qiuchen Huang , Yang Ai , Zhenhua Ling

Developing a good speaker embedding has received tremendous interest in the speech community, with representations such as i-vector and d-vector demonstrating remarkable performance across various tasks. Despite their widespread adoption, a…

Audio and Speech Processing · Electrical Eng. & Systems 2025-12-23 Shuai Wang , Yanmin Qian , Kai Yu

Recently, direct modeling of raw waveforms using deep neural networks has been widely studied for a number of tasks in audio domains. In speaker verification, however, utilization of raw waveforms is in its preliminary phase, requiring…

Audio and Speech Processing · Electrical Eng. & Systems 2019-07-18 Jee-weon Jung , Hee-Soo Heo , Ju-ho Kim , Hye-jin Shim , Ha-Jin Yu

Recent advances in deep learning have facilitated the design of speaker verification systems that directly input raw waveforms. For example, RawNet extracts speaker embeddings from raw waveforms, which simplifies the process pipeline and…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-08 Jee-weon Jung , Seung-bin Kim , Hye-jin Shim , Ju-ho Kim , Ha-Jin Yu

Speaker verification aims to verify whether an input speech corresponds to the claimed speaker, and conventionally, this kind of system is deployed based on single-stream scenario, wherein the feature extractor operates in full frequency…

Sound · Computer Science 2025-09-03 Wei Yao , Shen Chen , Jiamin Cui , Yaolin Lou

This paper presents an improved deep embedding learning method based on convolutional neural network (CNN) for text-independent speaker verification. Two improvements are proposed for x-vector embedding learning: (1) Multi-scale convolution…

Audio and Speech Processing · Electrical Eng. & Systems 2020-01-15 Bin Gu , Wu Guo

Unsupervised speech disentanglement aims at separating fast varying from slowly varying components of a speech signal. In this contribution, we take a closer look at the embedding vector representing the slowly varying signal components,…

Audio and Speech Processing · Electrical Eng. & Systems 2023-10-20 Frederik Rautenberg , Michael Kuhlmann , Jana Wiechmann , Fritz Seebauer , Petra Wagner , Reinhold Haeb-Umbach

Identifying multiple speakers without knowing where a speaker's voice is in a recording is a challenging task. This paper proposes a hierarchical network with transformer encoders and memory mechanism to address this problem. The proposed…

Sound · Computer Science 2020-11-02 Yanpei Shi , Mingjie Chen , Qiang Huang , Thomas Hain

Deep learning has dramatically improved the performance of speech recognition systems through learning hierarchies of features optimized for the task at hand. However, true end-to-end learning, where features are learned directly from…

Computation and Language · Computer Science 2016-04-06 Zhenyao Zhu , Jesse H. Engel , Awni Hannun

Deep learning is progressively gaining popularity as a viable alternative to i-vectors for speaker recognition. Promising results have been recently obtained with Convolutional Neural Networks (CNNs) when fed by raw speech samples directly.…

Audio and Speech Processing · Electrical Eng. & Systems 2019-08-12 Mirco Ravanelli , Yoshua Bengio

Speaker embeddings are widely used in speaker verification systems and other applications where it is useful to characterise the voice of a speaker with a fixed-length vector. These embeddings tend to be treated as "black box" encodings,…

Sound · Computer Science 2025-10-21 Mark Huckvale

We present a transformer-based architecture for voice separation of a target speaker from multiple other speakers and ambient noise. We achieve this by using two separate neural networks: (A) An enrolment network designed to craft…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-03 Akam Rahimi , Triantafyllos Afouras , Andrew Zisserman

Deep neural network based speaker embeddings, such as x-vectors, have been shown to perform well in text-independent speaker recognition/verification tasks. In this paper, we use simple classifiers to investigate the contents encoded by…

Audio and Speech Processing · Electrical Eng. & Systems 2020-06-16 Desh Raj , David Snyder , Daniel Povey , Sanjeev Khudanpur

One of the most popular speaker embeddings is x-vectors, which are obtained from an architecture that gradually builds a larger temporal context with layers. In this paper, we propose to derive speaker embeddings from Transformer's encoder…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-14 N J Metilda Sagaya Mary , S Umesh , Sandesh V Katta

Single-channel speech separation in time domain and frequency domain has been widely studied for voice-driven applications over the past few years. Most of previous works assume known number of speakers in advance, however, which is not…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-02 Yiming Xiao , Haijian Zhang

Deep learning has dramatically improved the performance of sounds recognition. However, learning acoustic models directly from the raw waveform is still challenging. Current waveform-based models generally use time-domain convolutional…

Sound · Computer Science 2018-03-29 Boqing Zhu , Changjian Wang , Feng Liu , Jin Lei , Zengquan Lu , Yuxing Peng

In this paper, we propose an innovative approach to perform speaker recognition by fusing two recently introduced deep neural networks (DNNs) namely - SincNet and X-Vector. The idea behind using SincNet filters on the raw speech waveform is…

Computation and Language · Computer Science 2020-04-07 Mayank Tripathi , Divyanshu Singh , Seba Susan

We propose an approach to extract speaker embeddings that are robust to speaking style variations in text-independent speaker verification. Typically, speaker embedding extraction includes training a DNN for speaker classification and using…

Audio and Speech Processing · Electrical Eng. & Systems 2022-06-29 Amber Afshan , Abeer Alwan

In recent years, using raw waveforms as input for deep networks has been widely explored for the speaker verification system. For example, RawNet and RawNet2 extracted speaker's feature embeddings from waveforms automatically for…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-08 Jin Li , Nan Yan , Lan Wang

Verifying the identity of a speaker is crucial in modern human-machine interfaces, e.g., to ensure privacy protection or to enable biometric authentication. Classical speaker verification (SV) approaches estimate a fixed-dimensional…

Audio and Speech Processing · Electrical Eng. & Systems 2022-06-29 Ahmad Aloradi , Wolfgang Mack , Mohamed Elminshawi , Emanuël A. P. Habets
‹ Prev 1 2 3 10 Next ›