English
Related papers

Related papers: Dynamic Encoder Size Based on Data-Driven Layer-wi…

200 papers

In this paper, we propose a dynamic cascaded encoder Automatic Speech Recognition (ASR) model, which unifies models for different deployment scenarios. Moreover, the model can significantly reduce model size and power consumption without…

The recent success of Automatic Speech Recognition (ASR) is largely attributed to the ever-growing amount of training data. However, this trend has made model training prohibitively costly and imposed computational demands. While data…

In automatic speech recognition (ASR), model pruning is a widely adopted technique that reduces model size and latency to deploy neural network models on edge devices with resource constraints. However, multiple models with different…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-09 Zhaofeng Wu , Ding Zhao , Qiao Liang , Jiahui Yu , Anmol Gulati , Ruoming Pang

Recent advancement in deep learning encouraged developing large automatic speech recognition (ASR) models that achieve promising results while ignoring computational and memory constraints. However, deploying such models on low resource…

Computer Vision and Pattern Recognition · Computer Science 2025-05-29 Abdul Hannan , Alessio Brutti , Shah Nawaz , Mubashir Noman

Automatic Speech Recognition (ASR) has seen remarkable advancements with deep neural networks, such as Transformer and Conformer. However, these models typically have large model sizes and high inference costs, posing a challenge to deploy…

Computation and Language · Computer Science 2023-06-01 Huiqiang Jiang , Li Lyna Zhang , Yuang Li , Yu Wu , Shijie Cao , Ting Cao , Yuqing Yang , Jinyu Li , Mao Yang , Lili Qiu

Deploying an end-to-end automatic speech recognition (ASR) model on mobile/embedded devices is a challenging task, since the device computational power and energy consumption requirements are dynamically changed in practice. To overcome the…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-18 Jaesong Lee , Jingu Kang , Shinji Watanabe

Neural network pruning offers an effective method for compressing a multilingual automatic speech recognition (ASR) model with minimal performance loss. However, it entails several rounds of pruning and re-training needed to be run for each…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-19 Jiamin Xie , Ke Li , Jinxi Guo , Andros Tjandra , Yuan Shangguan , Leda Sari , Chunyang Wu , Junteng Jia , Jay Mahadeokar , Ozlem Kalinli

We study model pruning methods applied to Transformer-based neural network language models for automatic speech recognition. We explore three aspects of the pruning frame work, namely criterion, method and scheduler, analyzing their…

Machine Learning · Computer Science 2023-10-06 Leonardo Emili , Thiago Fraga-Silva , Ernest Pusateri , Markus Nußbaum-Thom , Youssef Oualil

ASR systems are deployed across diverse environments, each with specific hardware constraints. We use supernet training to jointly train multiple encoders of varying sizes, enabling dynamic model size adjustment to fit hardware constraints…

Computation and Language · Computer Science 2025-02-05 Jingjing Xu , Eugen Beck , Zijian Yang , Ralf Schlüter

The state of the art of many learning tasks, e.g., image classification, is advanced by collecting larger datasets and then training larger models on them. As the outcome, the increasing computational cost is becoming unaffordable. In this…

Machine Learning · Computer Science 2024-06-17 Muyang He , Shuo Yang , Tiejun Huang , Bo Zhao

Recurrent neural networks (RNNs) achieve cutting-edge performance on a variety of problems. However, due to their high computational and memory demands, deploying RNNs on resource constrained mobile devices is a challenging task. To…

Machine Learning · Computer Science 2018-06-12 Jie Zhang , Xiaolong Wang , Dawei Li , Yalin Wang

Discrete speech representations have garnered recent attention for their efficacy in training transformer-based models for various speech-related tasks such as automatic speech recognition (ASR), translation, speaker verification, and joint…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-26 Kunal Dhawan , Nithin Rao Koluguri , Ante Jukić , Ryan Langman , Jagadeesh Balam , Boris Ginsburg

We propose a dynamic encoder transducer (DET) for on-device speech recognition. One DET model scales to multiple devices with different computation capacities without retraining or finetuning. To trading off accuracy and latency, DET…

On-device speech recognition requires training models of different sizes for deploying on devices with various computational budgets. When building such different models, we can benefit from training them jointly to take advantage of the…

Computation and Language · Computer Science 2021-07-15 Varun Nagaraja , Yangyang Shi , Ganesh Venkatesh , Ozlem Kalinli , Michael L. Seltzer , Vikas Chandra

We propose a novel parameter-efficient training (PET) method for large language models that adapts models to downstream tasks by optimizing a small subset of the existing model parameters. Unlike prior methods, this subset is not fixed in…

Computation and Language · Computer Science 2024-11-14 Felix Stahlberg , Jared Lichtarge , Shankar Kumar

Deep learning recommendation systems at scale have provided remarkable gains through increasing model capacity (i.e. wider and deeper neural networks), but it comes at significant training cost and infrastructure cost. Model pruning is an…

Information Retrieval · Computer Science 2021-05-05 Xiaocong Du , Bhargav Bhushanam , Jiecao Yu , Dhruv Choudhary , Tianxiang Gao , Sherman Wong , Louis Feng , Jongsoo Park , Yu Cao , Arun Kejariwal

Large-scale supervised classification algorithms, especially those based on deep convolutional neural networks (DCNNs), require vast amounts of training data to achieve state-of-the-art performance. Decreasing this data requirement would…

Computer Vision and Pattern Recognition · Computer Science 2016-06-15 Maya Kabkab , Azadeh Alavi , Rama Chellappa

Automatic Speech Recognition (ASR) models need to be optimized for specific hardware before they can be deployed on devices. This can be done by tuning the model's hyperparameters or exploring variations in its architecture. Re-training and…

From wearables to powerful smart devices, modern automatic speech recognition (ASR) models run on a variety of edge devices with different computational budgets. To navigate the Pareto front of model accuracy vs model size, researchers are…

We present a novel network pruning algorithm called Dynamic Sparse Training that can jointly find the optimal network parameters and sparse network structure in a unified optimization process with trainable pruning thresholds. These…

Machine Learning · Computer Science 2020-05-15 Junjie Liu , Zhe Xu , Runbin Shi , Ray C. C. Cheung , Hayden K. H. So
‹ Prev 1 2 3 10 Next ›