English
Related papers

Related papers: Regularizing Learnable Feature Extraction for Auto…

200 papers

Neural front-ends represent a promising approach to feature extraction for automatic speech recognition (ASR) systems as they enable to learn specifically tailored features for different tasks. Yet, many of the existing techniques remain…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-15 Peter Vieting , Benedikt Hilmes , Ralf Schlüter , Hermann Ney

Large, pre-trained representation models trained using self-supervised learning have gained popularity in various fields of machine learning because they are able to extract high-quality salient features from input data. As such, they have…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-16 Hejung Yang , Hong-Goo Kang

End-to-end models for robust automatic speech recognition (ASR) have not been sufficiently well-explored in prior work. With end-to-end models, one could choose to preprocess the input speech using speech enhancement techniques and train…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-15 Archiki Prasad , Preethi Jyothi , Rajbabu Velmurugan

Automatic Speech Recognition (ASR) systems generalize poorly on accented speech. The phonetic and linguistic variability of accents present hard challenges for ASR systems today in both data collection and modeling strategies. The resulting…

Automatic speech recognition (ASR) has reached a level of accuracy in recent years, that even outperforms humans in transcribing speech to text. Nevertheless, all current ASR approaches show a certain weakness against ambient noise. To…

Sound · Computer Science 2023-12-22 Christopher Simic , Tobias Bocklet

Automatic Speech Recognition (ASR) is an integral component of modern technology, powering applications such as voice-activated assistants, transcription services, and accessibility tools. Yet ASR systems continue to struggle with the…

Audio and Speech Processing · Electrical Eng. & Systems 2026-05-20 Mohammad Reza Peyghan , Saman Soleimani Roudi , Saeedreza Zouashkiani , Sajjad Amini , Fatemeh Rajabi , Shahrokh Ghaemmaghami

Pre-trained automatic speech recognition (ASR) models have demonstrated strong performance on a variety of tasks. However, their performance can degrade substantially when the input audio comes from different recording channels. While…

Sound · Computer Science 2025-08-25 Kuan-Tang Huang , Li-Wei Chen , Hung-Shin Lee , Berlin Chen , Hsin-Min Wang

In the past few years, it has been shown that deep learning systems are highly vulnerable under attacks with adversarial examples. Neural-network-based automatic speech recognition (ASR) systems are no exception. Targeted and untargeted…

Audio and Speech Processing · Electrical Eng. & Systems 2024-11-07 Matías Pizarro , Dorothea Kolossa , Asja Fischer

With the success of neural network based modeling in automatic speech recognition (ASR), many studies investigated acoustic modeling and learning of feature extractors directly based on the raw waveform. Recently, one line of research has…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-06 Peter Vieting , Christoph Lüscher , Wilfried Michel , Ralf Schlüter , Hermann Ney

Form about four decades human beings have been dreaming of an intelligent machine which can master the natural speech. In its simplest form, this machine should consist of two subsystems, namely automatic speech recognition (ASR) and speech…

Sound · Computer Science 2013-05-08 Urmila Shrawankar , V. M. Thakare

Mel-scale spectrum features are used in various recognition and classification tasks on speech signals. There is no reason to expect that these features are optimal for all different tasks, including speaker verification (SV). This paper…

Audio and Speech Processing · Electrical Eng. & Systems 2022-06-16 Jingyu Li , Yusheng Tian , Tan Lee

In audio signal processing, learnable front-ends have shown strong performance across diverse tasks by optimizing task-specific representation. However, their parameters remain fixed once trained, lacking flexibility during inference and…

Audio and Speech Processing · Electrical Eng. & Systems 2026-01-29 Hanyu Meng , Vidhyasaharan Sethu , Eliathamby Ambikairajah , Qiquan Zhang , Haizhou Li

State-of-the-art automatic speech recognition (ASR) systems struggle with the lack of data for rare accents. For sufficiently large datasets, neural engines tend to outshine statistical models in most natural language processing problems.…

Sound · Computer Science 2018-07-11 Fedor Kitashov , Elizaveta Svitanko , Debojyoti Dutta

Self-supervised pre-training of a speech foundation model, followed by supervised fine-tuning, has shown impressive quality improvements on automatic speech recognition (ASR) tasks. Fine-tuning separate foundation models for many downstream…

Machine Learning · Computer Science 2022-11-08 Zhouyuan Huo , Khe Chai Sim , Bo Li , Dongseong Hwang , Tara N. Sainath , Trevor Strohman

Predicting the altered acoustic frames is an effective way of self-supervised learning for speech representation. However, it is challenging to prevent the pretrained model from overfitting. In this paper, we proposed to introduce two…

Audio and Speech Processing · Electrical Eng. & Systems 2021-07-12 Jian Luo , Jianzong Wang , Ning Cheng , Jing Xiao

Automatic Speech Recognition (ASR) systems are often optimized to work best for speakers with canonical speech patterns. Unfortunately, these systems perform poorly when tested on atypical speech and heavily accented speech. It has…

Computation and Language · Computer Science 2021-09-16 Katrin Tomanek , Vicky Zayats , Dirk Padfield , Kara Vaillancourt , Fadi Biadsy

With various face presentation attacks arising under unseen scenarios, face anti-spoofing (FAS) based on domain generalization (DG) has drawn growing attention due to its robustness. Most existing methods utilize DG frameworks to align the…

Computer Vision and Pattern Recognition · Computer Science 2021-08-06 Shubao Liu , Ke-Yue Zhang , Taiping Yao , Mingwei Bi , Shouhong Ding , Jilin Li , Feiyue Huang , Lizhuang Ma

Neural network models for audio tasks, such as automatic speech recognition (ASR) and acoustic scene classification (ASC), are susceptible to noise contamination for real-life applications. To improve audio quality, an enhancement module,…

We consider the problem of recognizing speech utterances spoken to a device which is generating a known sound waveform; for example, recognizing queries issued to a digital assistant which is generating responses to previous user inputs.…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-03 Nathan Howard , Alex Park , Turaj Zakizadeh Shabestary , Alexander Gruenstein , Rohit Prabhavalkar

Automatic speech recognition (ASR) systems often make unrecoverable errors due to subsystem pruning (acoustic, language and pronunciation models); for example pruning words due to acoustics using short-term context, prior to rescoring with…

Computation and Language · Computer Science 2019-07-01 Prashanth Gurunath Shivakumar , Haoqi Li , Kevin Knight , Panayiotis Georgiou
‹ Prev 1 2 3 10 Next ›