Related papers: 2-bit Conformer quantization for automatic speech …

4-bit Conformer with Native Quantization Aware Training for Speech Recognition

Reducing the latency and model size has always been a significant research problem for live Automatic Speech Recognition (ASR) application scenarios. Along this direction, model quantization has become an increasingly popular approach to…

Audio and Speech Processing · Electrical Eng. & Systems 2023-03-06 Shaojin Ding , Phoenix Meadowlark , Yanzhang He , Lukasz Lew , Shivani Agrawal , Oleg Rybakov

Towards One-bit ASR: Extremely Low-bit Conformer Quantization Using Co-training and Stochastic Precision

Model compression has become an emerging need as the sizes of modern speech systems rapidly increase. In this paper, we study model weight quantization, which directly reduces the memory footprint to accommodate computationally…

Sound · Computer Science 2025-05-28 Zhaoqing Li , Haoning Xu , Zengrui Jin , Lingwei Meng , Tianzi Wang , Huimeng Wang , Youjun Chen , Mingyu Cui , Shujie Hu , Xunying Liu

USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models

End-to-end automatic speech recognition (ASR) models have seen revolutionary quality gains with the recent development of large-scale universal speech models (USM). However, deploying these massive USMs is extremely expensive due to the…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-17 Shaojin Ding , David Qiu , David Rim , Yanzhang He , Oleg Rybakov , Bo Li , Rohit Prabhavalkar , Weiran Wang , Tara N. Sainath , Zhonglin Han , Jian Li , Amir Yazdanbakhsh , Shivani Agrawal

Efficient Speech Representation Learning with Low-Bit Quantization

With the development of hardware for machine learning, newer models often come at the cost of both increased sizes and computational complexity. In effort to improve the efficiency for these models, we apply and investigate recent…

Audio and Speech Processing · Electrical Eng. & Systems 2023-01-03 Ching-Feng Yeh , Wei-Ning Hsu , Paden Tomasello , Abdelrahman Mohamed

A Conformer Based Acoustic Model for Robust Automatic Speech Recognition

This study addresses robust automatic speech recognition (ASR) by introducing a Conformer-based acoustic model. The proposed model builds on the wide residual bi-directional long short-term memory network (WRBN) with utterance-wise dropout…

Sound · Computer Science 2022-10-21 Yufeng Yang , Peidong Wang , DeLiang Wang

Accurate and Structured Pruning for Efficient Automatic Speech Recognition

Automatic Speech Recognition (ASR) has seen remarkable advancements with deep neural networks, such as Transformer and Conformer. However, these models typically have large model sizes and high inference costs, posing a challenge to deploy…

Computation and Language · Computer Science 2023-06-01 Huiqiang Jiang , Li Lyna Zhang , Yuang Li , Yu Wu , Shijie Cao , Ting Cao , Yuqing Yang , Jinyu Li , Mao Yang , Lili Qiu

Enhancing Quantised End-to-End ASR Models via Personalisation

Recent end-to-end automatic speech recognition (ASR) models have become increasingly larger, making them particularly challenging to be deployed on resource-constrained devices. Model quantisation is an effective solution that sometimes…

Sound · Computer Science 2023-09-19 Qiuming Zhao , Guangzhi Sun , Chao Zhang , Mingxing Xu , Thomas Fang Zheng

An Effective Training Framework for Light-Weight Automatic Speech Recognition Models

Recent advancement in deep learning encouraged developing large automatic speech recognition (ASR) models that achieve promising results while ignoring computational and memory constraints. However, deploying such models on low resource…

Computer Vision and Pattern Recognition · Computer Science 2025-05-29 Abdul Hannan , Alessio Brutti , Shah Nawaz , Mubashir Noman

Quantizing Whisper-small: How design choices affect ASR performance

Large speech recognition models like Whisper-small achieve high accuracy but are difficult to deploy on edge devices due to their high computational demand. To this end, we present a unified, cross-library evaluation of post-training…

Audio and Speech Processing · Electrical Eng. & Systems 2026-05-22 Arthur Söhler , Julian Irigoyen , Andreas Søeborg Kirkedal

4-bit Quantization of LSTM-based Speech Recognition Models

We investigate the impact of aggressive low-precision representations of weights and activations in two families of large LSTM-based architectures for Automatic Speech Recognition (ASR): hybrid Deep Bidirectional LSTM - Hidden Markov Models…

Computation and Language · Computer Science 2021-08-30 Andrea Fasoli , Chia-Yu Chen , Mauricio Serrano , Xiao Sun , Naigang Wang , Swagath Venkataramani , George Saon , Xiaodong Cui , Brian Kingsbury , Wei Zhang , Zoltán Tüske , Kailash Gopalakrishnan

A Simplified Fully Quantized Transformer for End-to-end Speech Recognition

While significant improvements have been made in recent years in terms of end-to-end automatic speech recognition (ASR) performance, such improvements were obtained through the use of very large neural networks, unfit for embedded use on…

Computation and Language · Computer Science 2020-03-25 Alex Bie , Bharat Venkitesh , Joao Monteiro , Md. Akmal Haidar , Mehdi Rezagholizadeh

Sub-8-bit quantization for on-device speech recognition: a regularization-free approach

For on-device automatic speech recognition (ASR), quantization aware training (QAT) is ubiquitous to achieve the trade-off between model predictive performance and efficiency. Among existing QAT methods, one major drawback is that the…

Sound · Computer Science 2022-11-02 Kai Zhen , Martin Radfar , Hieu Duy Nguyen , Grant P. Strimel , Nathan Susanj , Athanasios Mouchtaris

Speaker Adaptation for Quantised End-to-End ASR Models

End-to-end models have shown superior performance for automatic speech recognition (ASR). However, such models are often very large in size and thus challenging to deploy on resource-constrained edge devices. While quantisation can reduce…

Sound · Computer Science 2024-08-09 Qiuming Zhao , Guangzhi Sun , Chao Zhang , Mingxing Xu , Thomas Fang Zheng

Quantization for OpenAI's Whisper Models: A Comparative Analysis

Automated speech recognition (ASR) models have gained prominence for applications such as captioning, speech translation, and live transcription. This paper studies Whisper and two model variants: one optimized for live speech streaming and…

Sound · Computer Science 2025-03-14 Allison Andreyev

Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models

Recent advances in Automatic Speech Recognition (ASR) have demonstrated remarkable accuracy and robustness in diverse audio applications, such as live transcription and voice command processing. However, deploying these models on…

Sound · Computer Science 2025-08-05 Chen Feng , Yicheng Lin , Shaojie Zhuo , Chenzheng Su , Ramchalam Kinattinkara Ramakrishnan , Zhaocong Yuan , Xiaopeng Zhang

Effective and Efficient Mixed Precision Quantization of Speech Foundation Models

This paper presents a novel mixed-precision quantization approach for speech foundation models that tightly integrates mixed-precision learning and quantized model parameter estimation into one single model compression stage. Experiments…

Sound · Computer Science 2025-01-14 Haoning Xu , Zhaoqing Li , Zengrui Jin , Huimeng Wang , Youjun Chen , Guinan Li , Mengzhe Geng , Shujie Hu , Jiajun Deng , Xunying Liu

LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation

Modern automatic speech recognition (ASR) models, such as OpenAI's Whisper, rely on deep encoder-decoder architectures, and their encoders are a critical bottleneck for efficient deployment due to high computational intensity. We introduce…

Machine Learning · Computer Science 2025-08-26 Keisuke Kamahori , Jungo Kasai , Noriyuki Kojima , Baris Kasikci

One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model

We propose a novel one-pass multiple ASR systems joint compression and quantization approach using an all-in-one neural model. A single compression cycle allows multiple nested systems with varying Encoder depths, widths, and quantization…

Sound · Computer Science 2024-06-17 Zhaoqing Li , Haoning Xu , Tianzi Wang , Shoukang Hu , Zengrui Jin , Shujie Hu , Jiajun Deng , Mingyu Cui , Mengzhe Geng , Xunying Liu

Efficient infusion of self-supervised representations in Automatic Speech Recognition

Self-supervised learned (SSL) models such as Wav2vec and HuBERT yield state-of-the-art results on speech-related tasks. Given the effectiveness of such models, it is advantageous to use them in conventional ASR systems. While some…

Computation and Language · Computer Science 2024-04-22 Darshan Prabhu , Sai Ganesh Mirishkar , Pankaj Wasnik

BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing

This paper proposes a highly compact, lightweight text-to-speech (TTS) model for on-device applications. To reduce the model size, the proposed model introduces two techniques. First, we introduce quantization-aware training (QAT), which…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-05 Masaya Kawamura , Takuya Hasumi , Yuma Shirahata , Ryuichi Yamamoto