Related papers: Regularizing Learnable Feature Extraction for Auto…

Unified Learnable 2D Convolutional Feature Extraction for ASR

Neural front-ends represent a promising approach to feature extraction for automatic speech recognition (ASR) systems as they enable to learn specifically tailored features for different tasks. Yet, many of the existing techniques remain…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-15 Peter Vieting , Benedikt Hilmes , Ralf Schlüter , Hermann Ney

Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement

Large, pre-trained representation models trained using self-supervised learning have gained popularity in various fields of machine learning because they are able to extract high-quality salient features from input data. As such, they have…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-16 Hejung Yang , Hong-Goo Kang

An Investigation of End-to-End Models for Robust Speech Recognition

End-to-end models for robust automatic speech recognition (ASR) have not been sufficiently well-explored in prior work. With end-to-end models, one could choose to preprocess the input speech using speech enhancement techniques and train…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-15 Archiki Prasad , Preethi Jyothi , Rajbabu Velmurugan

Accented Speech Recognition: A Survey

Automatic Speech Recognition (ASR) systems generalize poorly on accented speech. The phonetic and linguistic variability of accents present hard challenges for ASR systems today in both data collection and modeling strategies. The resulting…

Computation and Language · Computer Science 2021-06-03 Arthur Hinsvark , Natalie Delworth , Miguel Del Rio , Quinten McNamara , Joshua Dong , Ryan Westerman , Michelle Huang , Joseph Palakapilly , Jennifer Drexler , Ilya Pirkin , Nishchal Bhandari , Miguel Jette

Self-Supervised Adaptive AV Fusion Module for Pre-Trained ASR Models

Automatic speech recognition (ASR) has reached a level of accuracy in recent years, that even outperforms humans in transcribing speech to text. Nevertheless, all current ASR approaches show a certain weakness against ambient noise. To…

Sound · Computer Science 2023-12-22 Christopher Simic , Tobias Bocklet

Non-Intrusive Automatic Speech Recognition Refinement: A Survey

Automatic Speech Recognition (ASR) is an integral component of modern technology, powering applications such as voice-activated assistants, transcription services, and accessibility tools. Yet ASR systems continue to struggle with the…

Audio and Speech Processing · Electrical Eng. & Systems 2026-05-20 Mohammad Reza Peyghan , Saman Soleimani Roudi , Saeedreza Zouashkiani , Sajjad Amini , Fatemeh Rajabi , Shahrokh Ghaemmaghami

Revealing the Role of Audio Channels in ASR Performance Degradation

Pre-trained automatic speech recognition (ASR) models have demonstrated strong performance on a variety of tasks. However, their performance can degrade substantially when the input audio comes from different recording channels. While…

Sound · Computer Science 2025-08-25 Kuan-Tang Huang , Li-Wei Chen , Hung-Shin Lee , Berlin Chen , Hsin-Min Wang

Robustifying automatic speech recognition by extracting slowly varying features

In the past few years, it has been shown that deep learning systems are highly vulnerable under attacks with adversarial examples. Neural-network-based automatic speech recognition (ASR) systems are no exception. Targeted and untargeted…

Audio and Speech Processing · Electrical Eng. & Systems 2024-11-07 Matías Pizarro , Dorothea Kolossa , Asja Fischer

On Architectures and Training for Raw Waveform Feature Extraction in ASR

With the success of neural network based modeling in automatic speech recognition (ASR), many studies investigated acoustic modeling and learning of feature extractors directly based on the raw waveform. Recently, one line of research has…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-06 Peter Vieting , Christoph Lüscher , Wilfried Michel , Ralf Schlüter , Hermann Ney

Speech Enhancement Modeling Towards Robust Speech Recognition System

Form about four decades human beings have been dreaming of an intelligent machine which can master the natural speech. In its simplest form, this machine should consist of two subsystems, namely automatic speech recognition (ASR) and speech…

Sound · Computer Science 2013-05-08 Urmila Shrawankar , V. M. Thakare

Learnable Frequency Filters for Speech Feature Extraction in Speaker Verification

Mel-scale spectrum features are used in various recognition and classification tasks on speech signals. There is no reason to expect that these features are optimal for all different tasks, including speaker verification (SV). This paper…

Audio and Speech Processing · Electrical Eng. & Systems 2022-06-16 Jingyu Li , Yusheng Tian , Tan Lee

Adaptive Per-Channel Energy Normalization Front-end for Robust Audio Signal Processing

In audio signal processing, learnable front-ends have shown strong performance across diverse tasks by optimizing task-specific representation. However, their parameters remain fixed once trained, lacking flexibility during inference and…

Audio and Speech Processing · Electrical Eng. & Systems 2026-01-29 Hanyu Meng , Vidhyasaharan Sethu , Eliathamby Ambikairajah , Qiquan Zhang , Haizhou Li

Foreign English Accent Adjustment by Learning Phonetic Patterns

State-of-the-art automatic speech recognition (ASR) systems struggle with the lack of data for rare accents. For sufficiently large datasets, neural engines tend to outshine statistical models in most natural language processing problems.…

Sound · Computer Science 2018-07-11 Fedor Kitashov , Elizaveta Svitanko , Debojyoti Dutta

Resource-Efficient Transfer Learning From Speech Foundation Model Using Hierarchical Feature Fusion

Self-supervised pre-training of a speech foundation model, followed by supervised fine-tuning, has shown impressive quality improvements on automatic speech recognition (ASR) tasks. Fine-tuning separate foundation models for many downstream…

Machine Learning · Computer Science 2022-11-08 Zhouyuan Huo , Khe Chai Sim , Bo Li , Dongseong Hwang , Tara N. Sainath , Trevor Strohman

Dropout Regularization for Self-Supervised Learning of Transformer Encoder Speech Representation

Predicting the altered acoustic frames is an effective way of self-supervised learning for speech representation. However, it is challenging to prevent the pretrained model from overfitting. In this paper, we proposed to introduce two…

Audio and Speech Processing · Electrical Eng. & Systems 2021-07-12 Jian Luo , Jianzong Wang , Ning Cheng , Jing Xiao

Residual Adapters for Parameter-Efficient ASR Adaptation to Atypical and Accented Speech

Automatic Speech Recognition (ASR) systems are often optimized to work best for speakers with canonical speech patterns. Unfortunately, these systems perform poorly when tested on atypical speech and heavily accented speech. It has…

Computation and Language · Computer Science 2021-09-16 Katrin Tomanek , Vicky Zayats , Dirk Padfield , Kara Vaillancourt , Fadi Biadsy

Adaptive Normalized Representation Learning for Generalizable Face Anti-Spoofing

With various face presentation attacks arising under unseen scenarios, face anti-spoofing (FAS) based on domain generalization (DG) has drawn growing attention due to its robustness. Most existing methods utilize DG frameworks to align the…

Computer Vision and Pattern Recognition · Computer Science 2021-08-06 Shubao Liu , Ke-Yue Zhang , Taiping Yao , Mingwei Bi , Shouhong Ding , Jilin Li , Feiyue Huang , Lizhuang Ma

Audio Enhancement for Computer Audition -- An Iterative Training Paradigm Using Sample Importance

Neural network models for audio tasks, such as automatic speech recognition (ASR) and acoustic scene classification (ASC), are susceptible to noise contamination for real-life applications. To improve audio quality, an enhancement module,…

Sound · Computer Science 2024-08-13 Manuel Milling , Shuo Liu , Andreas Triantafyllopoulos , Ilhan Aslan , Björn W. Schuller

A Neural Acoustic Echo Canceller Optimized Using An Automatic Speech Recognizer And Large Scale Synthetic Data

We consider the problem of recognizing speech utterances spoken to a device which is generating a known sound waveform; for example, recognizing queries issued to a digital assistant which is generating responses to previous user inputs.…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-03 Nathan Howard , Alex Park , Turaj Zakizadeh Shabestary , Alexander Gruenstein , Rohit Prabhavalkar

Learning from Past Mistakes: Improving Automatic Speech Recognition Output via Noisy-Clean Phrase Context Modeling

Automatic speech recognition (ASR) systems often make unrecoverable errors due to subsystem pruning (acoustic, language and pronunciation models); for example pruning words due to acoustics using short-term context, prior to rescoring with…

Computation and Language · Computer Science 2019-07-01 Prashanth Gurunath Shivakumar , Haoqi Li , Kevin Knight , Panayiotis Georgiou