Related papers: Q8BERT: Quantized 8Bit BERT

MKQ-BERT: Quantized BERT with 4-bits Weights and Activations

Recently, pre-trained Transformer based language models, such as BERT, have shown great superiority over the traditional methods in many Natural Language Processing (NLP) tasks. However, the computational cost for deploying these models is…

Machine Learning · Computer Science 2022-03-28 Hanlin Tang , Xipeng Zhang , Kai Liu , Jianchen Zhu , Zhanhui Kang

FP8-BERT: Post-Training Quantization for Transformer

Transformer-based models, such as BERT, have been widely applied in a wide range of natural language processing tasks. However, one inevitable side effect is that they require massive memory storage and inference cost when deployed in…

Artificial Intelligence · Computer Science 2023-12-13 Jianwei Li , Tianchi Zhang , Ian En-Hsu Yen , Dongkuan Xu

Compressing Large-Scale Transformer-Based Models: A Case Study on BERT

Pre-trained Transformer-based models have achieved state-of-the-art performance for various Natural Language Processing (NLP) tasks. However, these models often have billions of parameters, and, thus, are too resource-hungry and…

Machine Learning · Computer Science 2021-09-29 Prakhar Ganesh , Yao Chen , Xin Lou , Mohammad Ali Khan , Yin Yang , Hassan Sajjad , Preslav Nakov , Deming Chen , Marianne Winslett

Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey

Large, pre-trained transformer-based language models such as BERT have drastically changed the Natural Language Processing (NLP) field. We present a survey of recent work that uses these large language models to solve NLP tasks via…

Computation and Language · Computer Science 2021-11-03 Bonan Min , Hayley Ross , Elior Sulem , Amir Pouran Ben Veyseh , Thien Huu Nguyen , Oscar Sainz , Eneko Agirre , Ilana Heinz , Dan Roth

bert2BERT: Towards Reusable Pretrained Language Models

In recent years, researchers tend to pre-train ever-larger language models to explore the upper limit of deep models. However, large language model pre-training costs intensive computational resources and most of the models are trained from…

Computation and Language · Computer Science 2021-10-15 Cheng Chen , Yichun Yin , Lifeng Shang , Xin Jiang , Yujia Qin , Fengyu Wang , Zhi Wang , Xiao Chen , Zhiyuan Liu , Qun Liu

Quantized Transformer Language Model Implementations on Edge Devices

Large-scale transformer-based models like the Bidirectional Encoder Representations from Transformers (BERT) are widely used for Natural Language Processing (NLP) applications, wherein these models are initially pre-trained with a large…

Computation and Language · Computer Science 2023-10-09 Mohammad Wali Ur Rahman , Murad Mehrab Abrar , Hunter Gibbons Copening , Salim Hariri , Sicong Shao , Pratik Satam , Soheil Salehi

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer…

Computation and Language · Computer Science 2020-02-11 Zhenzhong Lan , Mingda Chen , Sebastian Goodman , Kevin Gimpel , Piyush Sharma , Radu Soricut

An Investigation on Different Underlying Quantization Schemes for Pre-trained Language Models

Recently, pre-trained language models like BERT have shown promising performance on multiple natural language processing tasks. However, the application of these models has been limited due to their huge size. To reduce its size, a popular…

Computation and Language · Computer Science 2020-10-15 Zihan Zhao , Yuncong Liu , Lu Chen , Qi Liu , Rao Ma , Kai Yu

Sensi-BERT: Towards Sensitivity Driven Fine-Tuning for Parameter-Efficient BERT

Large pre-trained language models have recently gained significant traction due to their improved performance on various down-stream tasks like text classification and question answering, requiring only few epochs of fine-tuning. However,…

Computation and Language · Computer Science 2023-09-01 Souvik Kundu , Sharath Nittur Sridhar , Maciej Szankin , Sairam Sundaresan

Adapting Pre-trained Language Models for Quantum Natural Language Processing

The emerging classical-quantum transfer learning paradigm has brought a decent performance to quantum computational models in many tasks, such as computer vision, by enabling a combination of quantum models and classical pre-trained neural…

Quantum Physics · Physics 2023-02-28 Qiuchi Li , Benyou Wang , Yudong Zhu , Christina Lioma , Qun Liu

I-BERT: Integer-only BERT Quantization

Transformer based models, like BERT and RoBERTa, have achieved state-of-the-art results in many Natural Language Processing tasks. However, their memory footprint, inference latency, and power consumption are prohibitive efficient inference…

Computation and Language · Computer Science 2022-05-02 Sehoon Kim , Amir Gholami , Zhewei Yao , Michael W. Mahoney , Kurt Keutzer

How to Fine-Tune BERT for Text Classification?

Language model pre-training has proven to be useful in learning universal language representations. As a state-of-the-art language model pre-training model, BERT (Bidirectional Encoder Representations from Transformers) has achieved amazing…

Computation and Language · Computer Science 2020-02-06 Chi Sun , Xipeng Qiu , Yige Xu , Xuanjing Huang

Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT

Transformer based architectures have become de-facto models used for a range of Natural Language Processing tasks. In particular, the BERT based models achieved significant accuracy gain for GLUE tasks, CoNLL-03 and SQuAD. However, BERT…

Computation and Language · Computer Science 2021-04-21 Sheng Shen , Zhen Dong , Jiayu Ye , Linjian Ma , Zhewei Yao , Amir Gholami , Michael W. Mahoney , Kurt Keutzer

Prune Once for All: Sparse Pre-Trained Language Models

Transformer-based language models are applied to a wide range of applications in natural language processing. However, they are inefficient and difficult to deploy. In recent years, many compression algorithms have been proposed to increase…

Computation and Language · Computer Science 2021-11-11 Ofir Zafrir , Ariel Larey , Guy Boudoukh , Haihao Shen , Moshe Wasserblat

Neural Grapheme-to-Phoneme Conversion with Pre-trained Grapheme Models

Neural network models have achieved state-of-the-art performance on grapheme-to-phoneme (G2P) conversion. However, their performance relies on large-scale pronunciation dictionaries, which may not be available for a lot of languages.…

Computation and Language · Computer Science 2022-01-27 Lu Dong , Zhi-Qiang Guo , Chao-Hong Tan , Ya-Jun Hu , Yuan Jiang , Zhen-Hua Ling

TernaryBERT: Distillation-aware Ultra-low Bit BERT

Transformer-based pre-training models like BERT have achieved remarkable performance in many natural language processing tasks.However, these models are both computation and memory expensive, hindering their deployment to…

Computation and Language · Computer Science 2020-10-13 Wei Zhang , Lu Hou , Yichun Yin , Lifeng Shang , Xiao Chen , Xin Jiang , Qun Liu

Exploring Extreme Parameter Compression for Pre-trained Language Models

Recent work explored the potential of large-scale Transformer-based pre-trained models, especially Pre-trained Language Models (PLMs) in natural language processing. This raises many concerns from various perspectives, e.g., financial costs…

Computation and Language · Computer Science 2022-05-23 Yuxin Ren , Benyou Wang , Lifeng Shang , Xin Jiang , Qun Liu

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional…

Computation and Language · Computer Science 2019-05-28 Jacob Devlin , Ming-Wei Chang , Kenton Lee , Kristina Toutanova

Hardware Acceleration of Fully Quantized BERT for Efficient Natural Language Processing

BERT is the most recent Transformer-based model that achieves state-of-the-art performance in various NLP tasks. In this paper, we investigate the hardware acceleration of BERT on FPGA for edge computing. To tackle the issue of huge…

Hardware Architecture · Computer Science 2021-03-05 Zejian Liu , Gang Li , Jian Cheng

Robust Transfer Learning with Pretrained Language Models through Adapters

Transfer learning with large pretrained transformer-based language models like BERT has become a dominating approach for most NLP tasks. Simply fine-tuning those large language models on downstream tasks or combining it with task-specific…

Computation and Language · Computer Science 2021-08-06 Wenjuan Han , Bo Pang , Yingnian Wu