Related papers: Dynamic Encoder Size Based on Data-Driven Layer-wi…

A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes

In this paper, we propose a dynamic cascaded encoder Automatic Speech Recognition (ASR) model, which unifies models for different deployment scenarios. Moreover, the model can significantly reduce model size and power consumption without…

Audio and Speech Processing · Electrical Eng. & Systems 2022-06-28 Shaojin Ding , Weiran Wang , Ding Zhao , Tara N. Sainath , Yanzhang He , Robert David , Rami Botros , Xin Wang , Rina Panigrahy , Qiao Liang , Dongseong Hwang , Ian McGraw , Rohit Prabhavalkar , Trevor Strohman

Dynamic Data Pruning for Automatic Speech Recognition

The recent success of Automatic Speech Recognition (ASR) is largely attributed to the ever-growing amount of training data. However, this trend has made model training prohibitively costly and imposed computational demands. While data…

Computation and Language · Computer Science 2024-06-27 Qiao Xiao , Pingchuan Ma , Adriana Fernandez-Lopez , Boqian Wu , Lu Yin , Stavros Petridis , Mykola Pechenizkiy , Maja Pantic , Decebal Constantin Mocanu , Shiwei Liu

Dynamic Sparsity Neural Networks for Automatic Speech Recognition

In automatic speech recognition (ASR), model pruning is a widely adopted technique that reduces model size and latency to deploy neural network models on edge devices with resource constraints. However, multiple models with different…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-09 Zhaofeng Wu , Ding Zhao , Qiao Liang , Jiahui Yu , Anmol Gulati , Ruoming Pang

An Effective Training Framework for Light-Weight Automatic Speech Recognition Models

Recent advancement in deep learning encouraged developing large automatic speech recognition (ASR) models that achieve promising results while ignoring computational and memory constraints. However, deploying such models on low resource…

Computer Vision and Pattern Recognition · Computer Science 2025-05-29 Abdul Hannan , Alessio Brutti , Shah Nawaz , Mubashir Noman

Accurate and Structured Pruning for Efficient Automatic Speech Recognition

Automatic Speech Recognition (ASR) has seen remarkable advancements with deep neural networks, such as Transformer and Conformer. However, these models typically have large model sizes and high inference costs, posing a challenge to deploy…

Computation and Language · Computer Science 2023-06-01 Huiqiang Jiang , Li Lyna Zhang , Yuang Li , Yu Wu , Shijie Cao , Ting Cao , Yuqing Yang , Jinyu Li , Mao Yang , Lili Qiu

Layer Pruning on Demand with Intermediate CTC

Deploying an end-to-end automatic speech recognition (ASR) model on mobile/embedded devices is a challenging task, since the device computational power and energy consumption requirements are dynamically changed in practice. To overcome the…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-18 Jaesong Lee , Jingu Kang , Shinji Watanabe

Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model

Neural network pruning offers an effective method for compressing a multilingual automatic speech recognition (ASR) model with minimal performance loss. However, it entails several rounds of pruning and re-training needed to be run for each…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-19 Jiamin Xie , Ke Li , Jinxi Guo , Andros Tjandra , Yuan Shangguan , Leda Sari , Chunyang Wu , Junteng Jia , Jay Mahadeokar , Ozlem Kalinli

Neural Language Model Pruning for Automatic Speech Recognition

We study model pruning methods applied to Transformer-based neural network language models for automatic speech recognition. We explore three aspects of the pruning frame work, namely criterion, method and scheduler, analyzing their…

Machine Learning · Computer Science 2023-10-06 Leonardo Emili , Thiago Fraga-Silva , Ernest Pusateri , Markus Nußbaum-Thom , Youssef Oualil

Efficient Supernet Training with Orthogonal Softmax for Scalable ASR Model Compression

ASR systems are deployed across diverse environments, each with specific hardware constraints. We use supernet training to jointly train multiple encoders of varying sizes, enabling dynamic model size adjustment to fit hardware constraints…

Computation and Language · Computer Science 2025-02-05 Jingjing Xu , Eugen Beck , Zijian Yang , Ralf Schlüter

Large-scale Dataset Pruning with Dynamic Uncertainty

The state of the art of many learning tasks, e.g., image classification, is advanced by collecting larger datasets and then training larger models on them. As the outcome, the increasing computational cost is becoming unaffordable. In this…

Machine Learning · Computer Science 2024-06-17 Muyang He , Shuo Yang , Tiejun Huang , Bo Zhao

Dynamically Hierarchy Revolution: DirNet for Compressing Recurrent Neural Network on Mobile Devices

Recurrent neural networks (RNNs) achieve cutting-edge performance on a variety of problems. However, due to their high computational and memory demands, deploying RNNs on resource constrained mobile devices is a challenging task. To…

Machine Learning · Computer Science 2018-06-12 Jie Zhang , Xiaolong Wang , Dawei Li , Yalin Wang

Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations

Discrete speech representations have garnered recent attention for their efficacy in training transformer-based models for various speech-related tasks such as automatic speech recognition (ASR), translation, speaker verification, and joint…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-26 Kunal Dhawan , Nithin Rao Koluguri , Ante Jukić , Ryan Langman , Jagadeesh Balam , Boris Ginsburg

Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency

We propose a dynamic encoder transducer (DET) for on-device speech recognition. One DET model scales to multiple devices with different computation capacities without retraining or finetuning. To trading off accuracy and latency, DET…

Computation and Language · Computer Science 2021-04-07 Yangyang Shi , Varun Nagaraja , Chunyang Wu , Jay Mahadeokar , Duc Le , Rohit Prabhavalkar , Alex Xiao , Ching-Feng Yeh , Julian Chan , Christian Fuegen , Ozlem Kalinli , Michael L. Seltzer

Collaborative Training of Acoustic Encoders for Speech Recognition

On-device speech recognition requires training models of different sizes for deploying on devices with various computational budgets. When building such different models, we can benefit from training them jointly to take advantage of the…

Computation and Language · Computer Science 2021-07-15 Varun Nagaraja , Yangyang Shi , Ganesh Venkatesh , Ozlem Kalinli , Michael L. Seltzer , Vikas Chandra

Dynamic Subset Tuning: Expanding the Operational Range of Parameter-Efficient Training for Large Language Models

We propose a novel parameter-efficient training (PET) method for large language models that adapts models to downstream tasks by optimizing a small subset of the existing model parameters. Unlike prior methods, this subset is not fixed in…

Computation and Language · Computer Science 2024-11-14 Felix Stahlberg , Jared Lichtarge , Shankar Kumar

Alternate Model Growth and Pruning for Efficient Training of Recommendation Systems

Deep learning recommendation systems at scale have provided remarkable gains through increasing model capacity (i.e. wider and deeper neural networks), but it comes at significant training cost and infrastructure cost. Model pruning is an…

Information Retrieval · Computer Science 2021-05-05 Xiaocong Du , Bhargav Bhushanam , Jiecao Yu , Dhruv Choudhary , Tianxiang Gao , Sherman Wong , Louis Feng , Jongsoo Park , Yu Cao , Arun Kejariwal

DCNNs on a Diet: Sampling Strategies for Reducing the Training Set Size

Large-scale supervised classification algorithms, especially those based on deep convolutional neural networks (DCNNs), require vast amounts of training data to achieve state-of-the-art performance. Decreasing this data requirement would…

Computer Vision and Pattern Recognition · Computer Science 2016-06-15 Maya Kabkab , Azadeh Alavi , Rama Chellappa

TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models

Automatic Speech Recognition (ASR) models need to be optimized for specific hardware before they can be deployed on devices. This can be done by tuning the model's hyperparameters or exploring variations in its architecture. Re-training and…

Computation and Language · Computer Science 2023-11-28 Yuan Shangguan , Haichuan Yang , Danni Li , Chunyang Wu , Yassir Fathullah , Dilin Wang , Ayushi Dalmia , Raghuraman Krishnamoorthi , Ozlem Kalinli , Junteng Jia , Jay Mahadeokar , Xin Lei , Mike Seltzer , Vikas Chandra

Omni-sparsity DNN: Fast Sparsity Optimization for On-Device Streaming E2E ASR via Supernet

From wearables to powerful smart devices, modern automatic speech recognition (ASR) models run on a variety of edge devices with different computational budgets. To navigate the Pareto front of model accuracy vs model size, researchers are…

Sound · Computer Science 2022-07-21 Haichuan Yang , Yuan Shangguan , Dilin Wang , Meng Li , Pierce Chuang , Xiaohui Zhang , Ganesh Venkatesh , Ozlem Kalinli , Vikas Chandra

Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers

We present a novel network pruning algorithm called Dynamic Sparse Training that can jointly find the optimal network parameters and sparse network structure in a unified optimization process with trainable pruning thresholds. These…

Machine Learning · Computer Science 2020-05-15 Junjie Liu , Zhe Xu , Runbin Shi , Ray C. C. Cheung , Hayden K. H. So