Related papers: DeeBERT: Dynamic Early Exiting for Accelerating BE…

DPBERT: Efficient Inference for BERT based on Dynamic Planning

Large-scale pre-trained language models such as BERT have contributed significantly to the development of NLP. However, those models require large computational resources, making it difficult to be applied to mobile devices where computing…

Computation and Language · Computer Science 2023-08-02 Weixin Wu , Hankz Hankui Zhuo

Elbert: Fast Albert with Confidence-Window Based Early Exit

Despite the great success in Natural Language Processing (NLP) area, large pre-trained language models like BERT are not well-suited for resource-constrained or real-time applications owing to the large number of parameters and slow…

Computation and Language · Computer Science 2021-07-02 Keli Xie , Siyuan Lu , Meiqi Wang , Zhongfeng Wang

RomeBERT: Robust Training of Multi-Exit BERT

BERT has achieved superior performances on Natural Language Understanding (NLU) tasks. However, BERT possesses a large number of parameters and demands certain resources to deploy. For acceleration, Dynamic Early Exiting for BERT (DeeBERT)…

Computation and Language · Computer Science 2021-01-26 Shijie Geng , Peng Gao , Zuohui Fu , Yongfeng Zhang

CEEBERT: Cross-Domain Inference in Early Exit BERT

Pre-trained Language Models (PLMs), like BERT, with self-supervision objectives exhibit remarkable performance and generalization across various tasks. However, they suffer in inference latency due to their large size. To address this…

Computation and Language · Computer Science 2024-05-27 Divya Jyoti Bajpai , Manjesh Kumar Hanawal

BERT Loses Patience: Fast and Robust Inference with Early Exit

In this paper, we propose Patience-based Early Exit, a straightforward yet effective inference method that can be used as a plug-and-play technique to simultaneously improve the efficiency and robustness of a pretrained language model…

Computation and Language · Computer Science 2020-10-23 Wangchunshu Zhou , Canwen Xu , Tao Ge , Julian McAuley , Ke Xu , Furu Wei

SmartBERT: A Promotion of Dynamic Early Exiting Mechanism for Accelerating BERT Inference

Dynamic early exiting has been proven to improve the inference speed of the pre-trained language model like BERT. However, all samples must go through all consecutive layers before early exiting and more complex samples usually go through…

Computation and Language · Computer Science 2023-05-09 Boren Hu , Yun Zhu , Jiacheng Li , Siliang Tang

EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference

Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks. However, their hefty computational and memory demands make them challenging to deploy to…

Hardware Architecture · Computer Science 2021-09-07 Thierry Tambe , Coleman Hooper , Lillian Pentecost , Tianyu Jia , En-Yu Yang , Marco Donato , Victor Sanh , Paul N. Whatmough , Alexander M. Rush , David Brooks , Gu-Yeon Wei

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains…

Computation and Language · Computer Science 2020-03-03 Victor Sanh , Lysandre Debut , Julien Chaumond , Thomas Wolf

The Right Tool for the Job: Matching Model and Instance Complexities

As NLP models become larger, executing a trained model requires significant computational resources incurring monetary and environmental costs. To better respect a given inference budget, we propose a modification to contextual…

Computation and Language · Computer Science 2020-05-12 Roy Schwartz , Gabriel Stanovsky , Swabha Swayamdipta , Jesse Dodge , Noah A. Smith

Dynamic-TinyBERT: Boost TinyBERT's Inference Efficiency by Dynamic Sequence Length

Limited computational budgets often prevent transformers from being used in production and from having their high accuracy utilized. TinyBERT addresses the computational efficiency by self-distilling BERT into a smaller transformer…

Computation and Language · Computer Science 2021-11-19 Shira Guskin , Moshe Wasserblat , Ke Ding , Gyuwan Kim

DE$^3$-BERT: Distance-Enhanced Early Exiting for BERT based on Prototypical Networks

Early exiting has demonstrated its effectiveness in accelerating the inference of pre-trained language models like BERT by dynamically adjusting the number of layers executed. However, most existing early exiting methods only consider local…

Machine Learning · Computer Science 2025-12-30 Jianing He , Qi Zhang , Weiping Ding , Duoqian Miao , Jun Zhao , Liang Hu , Longbing Cao

DACT-BERT: Differentiable Adaptive Computation Time for an Efficient BERT Inference

Large-scale pre-trained language models have shown remarkable results in diverse NLP applications. Unfortunately, these performance gains have been accompanied by a significant increase in computation time and model size, stressing the need…

Computation and Language · Computer Science 2021-09-27 Cristóbal Eyzaguirre , Felipe del Río , Vladimir Araujo , Álvaro Soto

Fast and Accurate FSA System Using ELBERT: An Efficient and Lightweight BERT

With the development of deep learning and Transformer-based pre-trained models like BERT, the accuracy of many NLP tasks has been dramatically improved. However, the large number of parameters and computations also pose challenges for their…

Computation and Language · Computer Science 2022-12-07 Siyuan Lu , Chenchen Zhou , Keli Xie , Jun Lin , Zhongfeng Wang

FastBERT: a Self-distilling BERT with Adaptive Inference Time

Pre-trained language models like BERT have proven to be highly performant. However, they are often computationally expensive in many practical scenarios, for such heavy models can hardly be readily implemented with limited resources. To…

Computation and Language · Computer Science 2020-04-30 Weijie Liu , Peng Zhou , Zhe Zhao , Zhiruo Wang , Haotang Deng , Qi Ju

HuBERT-EE: Early Exiting HuBERT for Efficient Speech Recognition

Pre-training with self-supervised models, such as Hidden-unit BERT (HuBERT) and wav2vec 2.0, has brought significant improvements in automatic speech recognition (ASR). However, these models usually require an expensive computational cost…

Computation and Language · Computer Science 2024-06-21 Ji Won Yoon , Beom Jun Woo , Nam Soo Kim

CascadeBERT: Accelerating Inference of Pre-trained Language Models via Calibrated Complete Models Cascade

Dynamic early exiting aims to accelerate the inference of pre-trained language models (PLMs) by emitting predictions in internal layers without passing through the entire model. In this paper, we empirically analyze the working mechanism of…

Computation and Language · Computer Science 2021-09-06 Lei Li , Yankai Lin , Deli Chen , Shuhuai Ren , Peng Li , Jie Zhou , Xu Sun

TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference

Existing pre-trained language models (PLMs) are often computationally expensive in inference, making them impractical in various resource-limited real-world applications. To address this issue, we propose a dynamic token reduction approach…

Computation and Language · Computer Science 2021-05-26 Deming Ye , Yankai Lin , Yufei Huang , Maosong Sun

EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets

Heavily overparameterized language models such as BERT, XLNet and T5 have achieved impressive success in many NLP tasks. However, their high model complexity requires enormous computation resources and extremely long training time for both…

Computation and Language · Computer Science 2021-06-09 Xiaohan Chen , Yu Cheng , Shuohang Wang , Zhe Gan , Zhangyang Wang , Jingjing Liu

TinyBERT: Distilling BERT for Natural Language Understanding

Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, pre-trained language models are usually computationally expensive, so it is difficult to efficiently…

Computation and Language · Computer Science 2020-10-19 Xiaoqi Jiao , Yichun Yin , Lifeng Shang , Xin Jiang , Xiao Chen , Linlin Li , Fang Wang , Qun Liu

PALBERT: Teaching ALBERT to Ponder

Currently, pre-trained models can be considered the default choice for a wide range of NLP tasks. Despite their SoTA results, there is practical evidence that these models may require a different number of computing layers for different…

Machine Learning · Computer Science 2023-05-19 Nikita Balagansky , Daniil Gavrilov