Related papers: Learnable MFCCs for Speaker Verification

A Comparative Re-Assessment of Feature Extractors for Deep Speaker Embeddings

Modern automatic speaker verification relies largely on deep neural networks (DNNs) trained on mel-frequency cepstral coefficient (MFCC) features. While there are alternative feature extraction methods based on phase, prosody and long-term…

Audio and Speech Processing · Electrical Eng. & Systems 2020-07-31 Xuechen Liu , Md Sahidullah , Tomi Kinnunen

A Novel Windowing Technique for Efficient Computation of MFCC for Speaker Recognition

In this paper, we propose a novel family of windowing technique to compute Mel Frequency Cepstral Coefficient (MFCC) for automatic speaker recognition from speech. The proposed method is based on fundamental property of discrete time…

Computer Vision and Pattern Recognition · Computer Science 2015-06-05 Md. Sahidullah , Goutam Saha

Speaker Recognition using Deep Belief Networks

Short time spectral features such as mel frequency cepstral coefficients(MFCCs) have been previously deployed in state of the art speaker recognition systems, however lesser heed has been paid to short term spectral features that can be…

Audio and Speech Processing · Electrical Eng. & Systems 2018-05-24 Adrish Banerjee , Akash Dubey , Abhishek Menon , Shubham Nanda , Gora Chand Nandi

DNN Filter Bank Cepstral Coefficients for Spoofing Detection

With the development of speech synthesis techniques, automatic speaker verification systems face the serious challenge of spoofing attack. In order to improve the reliability of speaker verification systems, we develop a new filter bank…

Sound · Computer Science 2017-02-14 Hong Yu , Zheng-Hua Tan , Zhanyu Ma , Jun Guo

Learnable Frequency Filters for Speech Feature Extraction in Speaker Verification

Mel-scale spectrum features are used in various recognition and classification tasks on speech signals. There is no reason to expect that these features are optimal for all different tasks, including speaker verification (SV). This paper…

Audio and Speech Processing · Electrical Eng. & Systems 2022-06-16 Jingyu Li , Yusheng Tian , Tan Lee

Modified Mel Filter Bank to Compute MFCC of Subsampled Speech

Mel Frequency Cepstral Coefficients (MFCCs) are the most popularly used speech features in most speech and speaker recognition applications. In this work, we propose a modified Mel filter bank to extract MFCCs from subsampled speech. We…

Computation and Language · Computer Science 2014-10-29 Kiran Kumar Bhuvanagiri , Sunil Kumar Kopparapu

Wavelet-Based Mel-Frequency Cepstral Coefficients for Speaker Identification using Hidden Markov Models

To improve the performance of speaker identification systems, an effective and robust method is proposed to extract speech features, capable of operating in noisy environment. Based on the time-frequency multi-resolution property of wavelet…

Sound · Computer Science 2010-03-31 Mahmoud I. Abdalla , Hanaa S. Ali

A Comparison of Classifiers in Performing Speaker Accent Recognition Using MFCCs

An algorithm involving Mel-Frequency Cepstral Coefficients (MFCCs) is provided to perform signal feature extraction for the task of speaker accent recognition. Then different classifiers are compared based on the MFCC feature. For each…

Sound · Computer Science 2015-02-02 Zichen Ma , Ernest Fokoue

Text Independent Speaker Identification System for Access Control

Even human intelligence system fails to offer 100% accuracy in identifying speeches from a specific individual. Machine intelligence is trying to mimic humans in speaker identification problems through various approaches to speech feature…

Audio and Speech Processing · Electrical Eng. & Systems 2022-09-30 Oluyemi E. Adetoyi

Optimizing Multi-Taper Features for Deep Speaker Verification

Multi-taper estimators provide low-variance power spectrum estimates that can be used in place of the windowed discrete Fourier transform (DFT) to extract speech features such as mel-frequency cepstral coefficients (MFCCs). Even if past…

Sound · Computer Science 2021-10-27 Xuechen Liu , Md Sahidullah , Tomi Kinnunen

Optimized Power Normalized Cepstral Coefficients towards Robust Deep Speaker Verification

After their introduction to robust speech recognition, power normalized cepstral coefficient (PNCC) features were successfully adopted to other tasks, including speaker verification. However, as a feature extractor with long-term operations…

Sound · Computer Science 2021-09-27 Xuechen Liu , Md Sahidullah , Tomi Kinnunen

Revisiting MFCCs: Evidence for Spectral-Prosodic Coupling

Mel-frequency cepstral coefficients (MFCCs) are an important feature in speech processing. A deeper understanding of their properties can contribute to the work that is being done with both classical and deep learning models. This study…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-08 Vitor Magno de O. S. Bezerra , Gabriel F. A. Bastos , Jugurta Montalvão

Speaker Identification using MFCC-Domain Support Vector Machine

Speech recognition and speaker identification are important for authentication and verification in security purpose, but they are difficult to achieve. Speaker identification methods can be divided into text-independent and text-dependent.…

Machine Learning · Computer Science 2010-09-28 S. M. Kamruzzaman , A. N. M. Rezaul Karim , Md. Saiful Islam , Md. Emdadul Haque

Frequency-centroid features for word recognition of non-native English speakers

The objective of this work is to investigate complementary features which can aid the quintessential Mel frequency cepstral coefficients (MFCCs) in the task of closed, limited set word recognition for non-native English speakers of…

Sound · Computer Science 2022-06-16 Pierre Berjon , Rajib Sharma , Avishek Nag , Soumyabrata Dev

Optimization of data-driven filterbank for automatic speaker verification

Most of the speech processing applications use triangular filters spaced in mel-scale for feature extraction. In this paper, we propose a new data-driven filter design method which optimizes filter parameters from a given speech data.…

Audio and Speech Processing · Electrical Eng. & Systems 2020-07-22 Susanta Sarangi , Md Sahidullah , Goutam Saha

Toward Efficient Speech Emotion Recognition via Spectral Learning and Attention

Speech Emotion Recognition (SER) traditionally relies on auditory data analysis for emotion classification. Several studies have adopted different methods for SER. However, existing SER methods often struggle to capture subtle emotional…

Sound · Computer Science 2026-01-23 HyeYoung Lee , Muhammad Nadeem

Deep Speaker Feature Learning for Text-independent Speaker Verification

Recently deep neural networks (DNNs) have been used to learn speaker features. However, the quality of the learned features is not sufficiently good, so a complex back-end model, either neural or probabilistic, has to be used to address the…

Sound · Computer Science 2017-05-11 Lantian Li , Yixiang Chen , Ying Shi , Zhiyuan Tang , Dong Wang

Choice of Mel Filter Bank in Computing MFCC of a Resampled Speech

Mel Frequency Cepstral Coefficients (MFCCs) are the most popularly used speech features in most speech and speaker recognition applications. In this paper, we study the effect of resampling a speech signal on these speech features. We first…

Sound · Computer Science 2014-10-28 Laxmi Narayana M. , Sunil Kumar Kopparapu

Deep Learning based Emotion Recognition System Using Speech Features and Transcriptions

This paper proposes a speech emotion recognition method based on speech features and speech transcriptions (text). Speech features such as Spectrogram and Mel-frequency Cepstral Coefficients (MFCC) help retain emotion-related low-level…

Audio and Speech Processing · Electrical Eng. & Systems 2019-06-14 Suraj Tripathi , Abhay Kumar , Abhiram Ramesh , Chirag Singh , Promod Yenigalla

Comparative Analysis of Mel-Frequency Cepstral Coefficients and Wavelet Based Audio Signal Processing for Emotion Detection and Mental Health Assessment in Spoken Speech

The intersection of technology and mental health has spurred innovative approaches to assessing emotional well-being, particularly through computational techniques applied to audio data analysis. This study explores the application of…

Sound · Computer Science 2024-12-17 Idoko Agbo , Dr Hoda El-Sayed , M. D Kamruzzan Sarker