Related papers: Integer-only Zero-shot Quantization for Efficient …

Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models

Recent advances in Automatic Speech Recognition (ASR) have demonstrated remarkable accuracy and robustness in diverse audio applications, such as live transcription and voice command processing. However, deploying these models on…

Sound · Computer Science 2025-08-05 Chen Feng , Yicheng Lin , Shaojie Zhuo , Chenzheng Su , Ramchalam Kinattinkara Ramakrishnan , Zhaocong Yuan , Xiaopeng Zhang

Neural Zero-Inflated Quality Estimation Model For Automatic Speech Recognition System

The performances of automatic speech recognition (ASR) systems are usually evaluated by the metric word error rate (WER) when the manually transcribed data are provided, which are, however, expensively available in the real scenario. In…

Computation and Language · Computer Science 2020-09-01 Kai Fan , Jiayi Wang , Bo Li , Shiliang Zhang , Boxing Chen , Niyu Ge , Zhijie Yan

Towards One-bit ASR: Extremely Low-bit Conformer Quantization Using Co-training and Stochastic Precision

Model compression has become an emerging need as the sizes of modern speech systems rapidly increase. In this paper, we study model weight quantization, which directly reduces the memory footprint to accommodate computationally…

Sound · Computer Science 2025-05-28 Zhaoqing Li , Haoning Xu , Zengrui Jin , Lingwei Meng , Tianzi Wang , Huimeng Wang , Youjun Chen , Mingyu Cui , Shujie Hu , Xunying Liu

Sharpness-Aware Data Generation for Zero-shot Quantization

Zero-shot quantization aims to learn a quantized model from a pre-trained full-precision model with no access to original real training data. The common idea in zero-shot quantization approaches is to generate synthetic data for quantizing…

Machine Learning · Computer Science 2025-10-09 Dung Hoang-Anh , Cuong Pham Trung Le , Jianfei Cai , Thanh-Toan Do

I-BERT: Integer-only BERT Quantization

Transformer based models, like BERT and RoBERTa, have achieved state-of-the-art results in many Natural Language Processing tasks. However, their memory footprint, inference latency, and power consumption are prohibitive efficient inference…

Computation and Language · Computer Science 2022-05-02 Sehoon Kim , Amir Gholami , Zhewei Yao , Michael W. Mahoney , Kurt Keutzer

Speaker Adaptation for Quantised End-to-End ASR Models

End-to-end models have shown superior performance for automatic speech recognition (ASR). However, such models are often very large in size and thus challenging to deploy on resource-constrained edge devices. While quantisation can reduce…

Sound · Computer Science 2024-08-09 Qiuming Zhao , Guangzhi Sun , Chao Zhang , Mingxing Xu , Thomas Fang Zheng

Quantization for OpenAI's Whisper Models: A Comparative Analysis

Automated speech recognition (ASR) models have gained prominence for applications such as captioning, speech translation, and live transcription. This paper studies Whisper and two model variants: one optimized for live speech streaming and…

Sound · Computer Science 2025-03-14 Allison Andreyev

Deep Learning Models in Speech Recognition: Measuring GPU Energy Consumption, Impact of Noise and Model Quantization for Edge Deployment

Recent transformer-based ASR models have achieved word-error rates (WER) below 4%, surpassing human annotator accuracy, yet they demand extensive server resources, contributing to significant carbon footprints. The traditional server-based…

Sound · Computer Science 2024-05-03 Aditya Chakravarty

Zero-Shot Dynamic Quantization for Transformer Inference

We introduce a novel run-time method for significantly reducing the accuracy loss associated with quantizing BERT-like models to 8-bit integers. Existing methods for quantizing models either modify the training procedure,or they require an…

Computation and Language · Computer Science 2022-11-18 Yousef El-Kurdi , Jerry Quinn , Avirup Sil

Enhancing Quantised End-to-End ASR Models via Personalisation

Recent end-to-end automatic speech recognition (ASR) models have become increasingly larger, making them particularly challenging to be deployed on resource-constrained devices. Model quantisation is an effective solution that sometimes…

Sound · Computer Science 2023-09-19 Qiuming Zhao , Guangzhi Sun , Chao Zhang , Mingxing Xu , Thomas Fang Zheng

4-bit Conformer with Native Quantization Aware Training for Speech Recognition

Reducing the latency and model size has always been a significant research problem for live Automatic Speech Recognition (ASR) application scenarios. Along this direction, model quantization has become an increasingly popular approach to…

Audio and Speech Processing · Electrical Eng. & Systems 2023-03-06 Shaojin Ding , Phoenix Meadowlark , Yanzhang He , Lukasz Lew , Shivani Agrawal , Oleg Rybakov

A Simplified Fully Quantized Transformer for End-to-end Speech Recognition

While significant improvements have been made in recent years in terms of end-to-end automatic speech recognition (ASR) performance, such improvements were obtained through the use of very large neural networks, unfit for embedded use on…

Computation and Language · Computer Science 2020-03-25 Alex Bie , Bharat Venkitesh , Joao Monteiro , Md. Akmal Haidar , Mehdi Rezagholizadeh

Zero Shot Text to Speech Augmentation for Automatic Speech Recognition on Low-Resource Accented Speech Corpora

In recent years, automatic speech recognition (ASR) models greatly improved transcription performance both in clean, low noise, acoustic conditions and in reverberant environments. However, all these systems rely on the availability of…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-18 Francesco Nespoli , Daniel Barreda , Patrick A. Naylor

4-bit Quantization of LSTM-based Speech Recognition Models

We investigate the impact of aggressive low-precision representations of weights and activations in two families of large LSTM-based architectures for Automatic Speech Recognition (ASR): hybrid Deep Bidirectional LSTM - Hidden Markov Models…

Computation and Language · Computer Science 2021-08-30 Andrea Fasoli , Chia-Yu Chen , Mauricio Serrano , Xiao Sun , Naigang Wang , Swagath Venkataramani , George Saon , Xiaodong Cui , Brian Kingsbury , Wei Zhang , Zoltán Tüske , Kailash Gopalakrishnan

Integer-Only Neural Network Quantization Scheme Based on Shift-Batch-Normalization

Neural networks are very popular in many areas, but great computing complexity makes it hard to run neural networks on devices with limited resources. To address this problem, quantization methods are used to reduce model size and…

Machine Learning · Computer Science 2021-06-02 Qingyu Guo , Yuan Wang , Xiaoxin Cui

It's All In the Teacher: Zero-Shot Quantization Brought Closer to the Teacher

Model quantization is considered as a promising method to greatly reduce the resource requirements of deep neural networks. To deal with the performance drop induced by quantization errors, a popular method is to use training data to…

Computer Vision and Pattern Recognition · Computer Science 2022-04-04 Kanghyun Choi , Hye Yoon Lee , Deokki Hong , Joonsang Yu , Noseong Park , Youngsok Kim , Jinho Lee

Efficient Speech Representation Learning with Low-Bit Quantization

With the development of hardware for machine learning, newer models often come at the cost of both increased sizes and computational complexity. In effort to improve the efficiency for these models, we apply and investigate recent…

Audio and Speech Processing · Electrical Eng. & Systems 2023-01-03 Ching-Feng Yeh , Wei-Ning Hsu , Paden Tomasello , Abdelrahman Mohamed

On the efficient representation and execution of deep acoustic models

In this paper we present a simple and computationally efficient quantization scheme that enables us to reduce the resolution of the parameters of a neural network from 32-bit floating point values to 8-bit integer values. The proposed…

Machine Learning · Computer Science 2016-12-20 Raziel Alvarez , Rohit Prabhavalkar , Anton Bakhtin

Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models

Quantization is a promising approach for reducing memory overhead and accelerating inference, especially in large pre-trained language model (PLM) scenarios. While having no access to original training data due to security and privacy…

Computation and Language · Computer Science 2023-10-23 Miaoxi Zhu , Qihuang Zhong , Li Shen , Liang Ding , Juhua Liu , Bo Du , Dacheng Tao

USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models

End-to-end automatic speech recognition (ASR) models have seen revolutionary quality gains with the recent development of large-scale universal speech models (USM). However, deploying these massive USMs is extremely expensive due to the…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-17 Shaojin Ding , David Qiu , David Rim , Yanzhang He , Oleg Rybakov , Bo Li , Rohit Prabhavalkar , Weiran Wang , Tara N. Sainath , Zhonglin Han , Jian Li , Amir Yazdanbakhsh , Shivani Agrawal