Related papers: Dynamic Data Pruning for Automatic Speech Recognit…

Dynamic Sparsity Neural Networks for Automatic Speech Recognition

In automatic speech recognition (ASR), model pruning is a widely adopted technique that reduces model size and latency to deploy neural network models on edge devices with resource constraints. However, multiple models with different…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-09 Zhaofeng Wu , Ding Zhao , Qiao Liang , Jiahui Yu , Anmol Gulati , Ruoming Pang

An Effective Training Framework for Light-Weight Automatic Speech Recognition Models

Recent advancement in deep learning encouraged developing large automatic speech recognition (ASR) models that achieve promising results while ignoring computational and memory constraints. However, deploying such models on low resource…

Computer Vision and Pattern Recognition · Computer Science 2025-05-29 Abdul Hannan , Alessio Brutti , Shah Nawaz , Mubashir Noman

Accelerating Deep Learning with Dynamic Data Pruning

Deep learning's success has been attributed to the training of large, overparameterized models on massive amounts of data. As this trend continues, model training has become prohibitively costly, requiring access to powerful computing…

Machine Learning · Computer Science 2021-11-25 Ravi S Raju , Kyle Daruwalla , Mikko Lipasti

Does Speech enhancement of publicly available data help build robust Speech Recognition Systems?

Automatic speech recognition (ASR) systems play a key role in many commercial products including voice assistants. Typically, they require large amounts of clean speech data for training which gives an undue advantage to large organizations…

Audio and Speech Processing · Electrical Eng. & Systems 2019-11-21 Bhavya Ghai , Buvana Ramanan , Klaus Mueller

Dynamic Encoder Size Based on Data-Driven Layer-wise Pruning for Speech Recognition

Varying-size models are often required to deploy ASR systems under different hardware and/or application constraints such as memory and latency. To avoid redundant training and optimization efforts for individual models of different sizes,…

Audio and Speech Processing · Electrical Eng. & Systems 2024-07-30 Jingjing Xu , Wei Zhou , Zijian Yang , Eugen Beck , Ralf Schlueter

Accurate and Structured Pruning for Efficient Automatic Speech Recognition

Automatic Speech Recognition (ASR) has seen remarkable advancements with deep neural networks, such as Transformer and Conformer. However, these models typically have large model sizes and high inference costs, posing a challenge to deploy…

Computation and Language · Computer Science 2023-06-01 Huiqiang Jiang , Li Lyna Zhang , Yuang Li , Yu Wu , Shijie Cao , Ting Cao , Yuqing Yang , Jinyu Li , Mao Yang , Lili Qiu

Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model

Neural network pruning offers an effective method for compressing a multilingual automatic speech recognition (ASR) model with minimal performance loss. However, it entails several rounds of pruning and re-training needed to be run for each…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-19 Jiamin Xie , Ke Li , Jinxi Guo , Andros Tjandra , Yuan Shangguan , Leda Sari , Chunyang Wu , Junteng Jia , Jay Mahadeokar , Ozlem Kalinli

Unsupervised Data Selection via Discrete Speech Representation for ASR

Self-supervised learning of speech representations has achieved impressive results in improving automatic speech recognition (ASR). In this paper, we show that data selection is important for self-supervised learning. We propose a simple…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-06 Zhiyun Lu , Yongqiang Wang , Yu Zhang , Wei Han , Zhehuai Chen , Parisa Haghani

Exploiting Beam Search Confidence for Energy-Efficient Speech Recognition

With computers getting more and more powerful and integrated in our daily lives, the focus is increasingly shifting towards more human-friendly interfaces, making Automatic Speech Recognition (ASR) a central player as the ideal means of…

Sound · Computer Science 2021-01-25 Dennis Pinto , Jose-María Arnau , Antonio González

Towards Improved Speech Recognition through Optimized Synthetic Data Generation

Supervised training of speech recognition models requires access to transcribed audio data, which often is not possible due to confidentiality issues. Our approach to this problem is to generate synthetic audio from a text-only corpus using…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-01 Yanis Perrin , Gilles Boulianne

Sample adaptive data augmentation with progressive scheduling

Data augmentation is a widely adopted technique utilized to improve the robustness of automatic speech recognition (ASR). Employing a fixed data augmentation strategy for all training data is a common practice. However, it is important to…

Sound · Computer Science 2024-12-03 Hongxuan Lu , Biao Li

Which Data Matter? Embedding-Based Data Selection for Speech Recognition

Modern ASR systems are typically trained on large-scale pseudo-labeled, in-the-wild data spanning multiple domains. While such heterogeneous data benefit generalist models designed for broad deployment, they pose challenges for specialist…

Sound · Computer Science 2026-03-16 Zakaria Aldeneh , Skyler Seto , Maureen de Seyssel , Jie Chi , Zijin Gu , Takuya Higuchi , Jee-weon Jung , Shinji Watanabe , David Grangier , Barry-John Theobald , Tatiana Likhomanenko

A Study of Data Selection Strategies for Pre-training Self-Supervised Speech Models

Self-supervised learning (SSL) has transformed speech processing, yet its reliance on massive pre-training datasets remains a bottleneck. While robustness is often attributed to scale and diversity, the role of the data distribution is less…

Sound · Computer Science 2026-04-24 Ryan Whetten , Titouan Parcollet , Marco Dinarelli , Yannick Estève

Distributed Training of Deep Neural Network Acoustic Models for Automatic Speech Recognition

The past decade has witnessed great progress in Automatic Speech Recognition (ASR) due to advances in deep learning. The improvements in performance can be attributed to both improved models and large-scale training data. Key to training…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-02-26 Xiaodong Cui , Wei Zhang , Ulrich Finkler , George Saon , Michael Picheny , David Kung

Acoustic data-driven lexicon learning based on a greedy pronunciation selection framework

Speech recognition systems for irregularly-spelled languages like English normally require hand-written pronunciations. In this paper, we describe a system for automatically obtaining pronunciations of words for which pronunciations are not…

Computation and Language · Computer Science 2017-06-13 Xiaohui Zhang , Vimal Manohar , Daniel Povey , Sanjeev Khudanpur

Robustifying automatic speech recognition by extracting slowly varying features

In the past few years, it has been shown that deep learning systems are highly vulnerable under attacks with adversarial examples. Neural-network-based automatic speech recognition (ASR) systems are no exception. Targeted and untargeted…

Audio and Speech Processing · Electrical Eng. & Systems 2024-11-07 Matías Pizarro , Dorothea Kolossa , Asja Fischer

A Study in Dataset Pruning for Image Super-Resolution

In image Super-Resolution (SR), relying on large datasets for training is a double-edged sword. While offering rich training material, they also demand substantial computational and storage resources. In this work, we analyze dataset…

Image and Video Processing · Electrical Eng. & Systems 2024-06-11 Brian B. Moser , Federico Raue , Andreas Dengel

Representative Subset Selection for Efficient Fine-Tuning in Self-Supervised Speech Recognition

Self-supervised speech recognition models require considerable labeled training data for learning high-fidelity representations for Automatic Speech Recognition (ASR) which is computationally demanding and time-consuming. We consider the…

Machine Learning · Computer Science 2023-04-13 Abdul Hameed Azeemi , Ihsan Ayyub Qazi , Agha Ali Raza

Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition

With recent advances in speech synthesis, synthetic data is becoming a viable alternative to real data for training speech recognition models. However, machine learning with synthetic data is not trivial due to the gap between the synthetic…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-25 Ting-Yao Hu , Mohammadreza Armandpour , Ashish Shrivastava , Jen-Hao Rick Chang , Hema Koppula , Oncel Tuzel

Learning from Past Mistakes: Improving Automatic Speech Recognition Output via Noisy-Clean Phrase Context Modeling

Automatic speech recognition (ASR) systems often make unrecoverable errors due to subsystem pruning (acoustic, language and pronunciation models); for example pruning words due to acoustics using short-term context, prior to rescoring with…

Computation and Language · Computer Science 2019-07-01 Prashanth Gurunath Shivakumar , Haoqi Li , Kevin Knight , Panayiotis Georgiou