English
Related papers

Related papers: Dynamic Data Pruning for Automatic Speech Recognit…

200 papers

In automatic speech recognition (ASR), model pruning is a widely adopted technique that reduces model size and latency to deploy neural network models on edge devices with resource constraints. However, multiple models with different…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-09 Zhaofeng Wu , Ding Zhao , Qiao Liang , Jiahui Yu , Anmol Gulati , Ruoming Pang

Recent advancement in deep learning encouraged developing large automatic speech recognition (ASR) models that achieve promising results while ignoring computational and memory constraints. However, deploying such models on low resource…

Computer Vision and Pattern Recognition · Computer Science 2025-05-29 Abdul Hannan , Alessio Brutti , Shah Nawaz , Mubashir Noman

Deep learning's success has been attributed to the training of large, overparameterized models on massive amounts of data. As this trend continues, model training has become prohibitively costly, requiring access to powerful computing…

Machine Learning · Computer Science 2021-11-25 Ravi S Raju , Kyle Daruwalla , Mikko Lipasti

Automatic speech recognition (ASR) systems play a key role in many commercial products including voice assistants. Typically, they require large amounts of clean speech data for training which gives an undue advantage to large organizations…

Audio and Speech Processing · Electrical Eng. & Systems 2019-11-21 Bhavya Ghai , Buvana Ramanan , Klaus Mueller

Varying-size models are often required to deploy ASR systems under different hardware and/or application constraints such as memory and latency. To avoid redundant training and optimization efforts for individual models of different sizes,…

Audio and Speech Processing · Electrical Eng. & Systems 2024-07-30 Jingjing Xu , Wei Zhou , Zijian Yang , Eugen Beck , Ralf Schlueter

Automatic Speech Recognition (ASR) has seen remarkable advancements with deep neural networks, such as Transformer and Conformer. However, these models typically have large model sizes and high inference costs, posing a challenge to deploy…

Computation and Language · Computer Science 2023-06-01 Huiqiang Jiang , Li Lyna Zhang , Yuang Li , Yu Wu , Shijie Cao , Ting Cao , Yuqing Yang , Jinyu Li , Mao Yang , Lili Qiu

Neural network pruning offers an effective method for compressing a multilingual automatic speech recognition (ASR) model with minimal performance loss. However, it entails several rounds of pruning and re-training needed to be run for each…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-19 Jiamin Xie , Ke Li , Jinxi Guo , Andros Tjandra , Yuan Shangguan , Leda Sari , Chunyang Wu , Junteng Jia , Jay Mahadeokar , Ozlem Kalinli

Self-supervised learning of speech representations has achieved impressive results in improving automatic speech recognition (ASR). In this paper, we show that data selection is important for self-supervised learning. We propose a simple…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-06 Zhiyun Lu , Yongqiang Wang , Yu Zhang , Wei Han , Zhehuai Chen , Parisa Haghani

With computers getting more and more powerful and integrated in our daily lives, the focus is increasingly shifting towards more human-friendly interfaces, making Automatic Speech Recognition (ASR) a central player as the ideal means of…

Sound · Computer Science 2021-01-25 Dennis Pinto , Jose-María Arnau , Antonio González

Supervised training of speech recognition models requires access to transcribed audio data, which often is not possible due to confidentiality issues. Our approach to this problem is to generate synthetic audio from a text-only corpus using…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-01 Yanis Perrin , Gilles Boulianne

Data augmentation is a widely adopted technique utilized to improve the robustness of automatic speech recognition (ASR). Employing a fixed data augmentation strategy for all training data is a common practice. However, it is important to…

Sound · Computer Science 2024-12-03 Hongxuan Lu , Biao Li

Modern ASR systems are typically trained on large-scale pseudo-labeled, in-the-wild data spanning multiple domains. While such heterogeneous data benefit generalist models designed for broad deployment, they pose challenges for specialist…

Self-supervised learning (SSL) has transformed speech processing, yet its reliance on massive pre-training datasets remains a bottleneck. While robustness is often attributed to scale and diversity, the role of the data distribution is less…

Sound · Computer Science 2026-04-24 Ryan Whetten , Titouan Parcollet , Marco Dinarelli , Yannick Estève

The past decade has witnessed great progress in Automatic Speech Recognition (ASR) due to advances in deep learning. The improvements in performance can be attributed to both improved models and large-scale training data. Key to training…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-02-26 Xiaodong Cui , Wei Zhang , Ulrich Finkler , George Saon , Michael Picheny , David Kung

Speech recognition systems for irregularly-spelled languages like English normally require hand-written pronunciations. In this paper, we describe a system for automatically obtaining pronunciations of words for which pronunciations are not…

Computation and Language · Computer Science 2017-06-13 Xiaohui Zhang , Vimal Manohar , Daniel Povey , Sanjeev Khudanpur

In the past few years, it has been shown that deep learning systems are highly vulnerable under attacks with adversarial examples. Neural-network-based automatic speech recognition (ASR) systems are no exception. Targeted and untargeted…

Audio and Speech Processing · Electrical Eng. & Systems 2024-11-07 Matías Pizarro , Dorothea Kolossa , Asja Fischer

In image Super-Resolution (SR), relying on large datasets for training is a double-edged sword. While offering rich training material, they also demand substantial computational and storage resources. In this work, we analyze dataset…

Image and Video Processing · Electrical Eng. & Systems 2024-06-11 Brian B. Moser , Federico Raue , Andreas Dengel

Self-supervised speech recognition models require considerable labeled training data for learning high-fidelity representations for Automatic Speech Recognition (ASR) which is computationally demanding and time-consuming. We consider the…

Machine Learning · Computer Science 2023-04-13 Abdul Hameed Azeemi , Ihsan Ayyub Qazi , Agha Ali Raza

With recent advances in speech synthesis, synthetic data is becoming a viable alternative to real data for training speech recognition models. However, machine learning with synthetic data is not trivial due to the gap between the synthetic…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-25 Ting-Yao Hu , Mohammadreza Armandpour , Ashish Shrivastava , Jen-Hao Rick Chang , Hema Koppula , Oncel Tuzel

Automatic speech recognition (ASR) systems often make unrecoverable errors due to subsystem pruning (acoustic, language and pronunciation models); for example pruning words due to acoustics using short-term context, prior to rescoring with…

Computation and Language · Computer Science 2019-07-01 Prashanth Gurunath Shivakumar , Haoqi Li , Kevin Knight , Panayiotis Georgiou
‹ Prev 1 2 3 10 Next ›