Related papers: Dynamic-TinyBERT: Boost TinyBERT's Inference Effic…

QuaLA-MiniLM: a Quantized Length Adaptive MiniLM

Limited computational budgets often prevent transformers from being used in production and from having their high accuracy utilized. A knowledge distillation approach addresses the computational efficiency by self-distilling BERT into a…

Computation and Language · Computer Science 2023-05-11 Shira Guskin , Moshe Wasserblat , Chang Wang , Haihao Shen

TinyBERT: Distilling BERT for Natural Language Understanding

Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, pre-trained language models are usually computationally expensive, so it is difficult to efficiently…

Computation and Language · Computer Science 2020-10-19 Xiaoqi Jiao , Yichun Yin , Lifeng Shang , Xin Jiang , Xiao Chen , Linlin Li , Fang Wang , Qun Liu

TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference

Existing pre-trained language models (PLMs) are often computationally expensive in inference, making them impractical in various resource-limited real-world applications. To address this issue, we propose a dynamic token reduction approach…

Computation and Language · Computer Science 2021-05-26 Deming Ye , Yankai Lin , Yufei Huang , Maosong Sun

DPBERT: Efficient Inference for BERT based on Dynamic Planning

Large-scale pre-trained language models such as BERT have contributed significantly to the development of NLP. However, those models require large computational resources, making it difficult to be applied to mobile devices where computing…

Computation and Language · Computer Science 2023-08-02 Weixin Wu , Hankz Hankui Zhuo

DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

Large-scale pre-trained language models such as BERT have brought significant improvements to NLP applications. However, they are also notorious for being slow in inference, which makes them difficult to deploy in real-time applications. We…

Computation and Language · Computer Science 2020-04-28 Ji Xin , Raphael Tang , Jaejun Lee , Yaoliang Yu , Jimmy Lin

TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval

Pre-trained language models like BERT have achieved great success in a wide variety of NLP tasks, while the superior performance comes with high demand in computational resources, which hinders the application in low-latency IR systems. We…

Information Retrieval · Computer Science 2020-02-18 Wenhao Lu , Jian Jiao , Ruofei Zhang

DACT-BERT: Differentiable Adaptive Computation Time for an Efficient BERT Inference

Large-scale pre-trained language models have shown remarkable results in diverse NLP applications. Unfortunately, these performance gains have been accompanied by a significant increase in computation time and model size, stressing the need…

Computation and Language · Computer Science 2021-09-27 Cristóbal Eyzaguirre , Felipe del Río , Vladimir Araujo , Álvaro Soto

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains…

Computation and Language · Computer Science 2020-03-03 Victor Sanh , Lysandre Debut , Julien Chaumond , Thomas Wolf

BiBERT: Accurate Fully Binarized BERT

The large pre-trained BERT has achieved remarkable performance on Natural Language Processing (NLP) tasks but is also computation and memory expensive. As one of the powerful compression approaches, binarization extremely reduces the…

Computation and Language · Computer Science 2022-03-15 Haotong Qin , Yifu Ding , Mingyuan Zhang , Qinghua Yan , Aishan Liu , Qingqing Dang , Ziwei Liu , Xianglong Liu

DynaBERT: Dynamic BERT with Adaptive Width and Depth

The pre-trained language models like BERT, though powerful in many natural language processing tasks, are both computation and memory expensive. To alleviate this problem, one approach is to compress them for specific tasks before…

Computation and Language · Computer Science 2020-10-12 Lu Hou , Zhiqi Huang , Lifeng Shang , Xin Jiang , Xiao Chen , Qun Liu

TernaryBERT: Distillation-aware Ultra-low Bit BERT

Transformer-based pre-training models like BERT have achieved remarkable performance in many natural language processing tasks.However, these models are both computation and memory expensive, hindering their deployment to…

Computation and Language · Computer Science 2020-10-13 Wei Zhang , Lu Hou , Yichun Yin , Lifeng Shang , Xiao Chen , Xin Jiang , Qun Liu

RefBERT: Compressing BERT by Referencing to Pre-computed Representations

Recently developed large pre-trained language models, e.g., BERT, have achieved remarkable performance in many downstream natural language processing applications. These pre-trained language models often contain hundreds of millions of…

Computation and Language · Computer Science 2021-06-17 Xinyi Wang , Haiqin Yang , Liang Zhao , Yang Mo , Jianping Shen

FastBERT: a Self-distilling BERT with Adaptive Inference Time

Pre-trained language models like BERT have proven to be highly performant. However, they are often computationally expensive in many practical scenarios, for such heavy models can hardly be readily implemented with limited resources. To…

Computation and Language · Computer Science 2020-04-30 Weijie Liu , Peng Zhou , Zhe Zhao , Zhiruo Wang , Haotang Deng , Qi Ju

TopicBERT for Energy Efficient Document Classification

Prior research notes that BERT's computational cost grows quadratically with sequence length thus leading to longer training times, higher GPU memory constraints and carbon emissions. While recent work seeks to address these scalability…

Computation and Language · Computer Science 2020-11-02 Yatin Chaudhary , Pankaj Gupta , Khushbu Saxena , Vivek Kulkarni , Thomas Runkler , Hinrich Schütze

Simplified TinyBERT: Knowledge Distillation for Document Retrieval

Despite the effectiveness of utilizing the BERT model for document ranking, the high computational cost of such approaches limits their uses. To this end, this paper first empirically investigates the effectiveness of two knowledge…

Information Retrieval · Computer Science 2023-05-05 Xuanang Chen , Ben He , Kai Hui , Le Sun , Yingfei Sun

NarrowBERT: Accelerating Masked Language Model Pretraining and Inference

Large-scale language model pretraining is a very successful form of self-supervised learning in natural language processing, but it is increasingly expensive to perform as the models and pretraining corpora have become larger over time. We…

Computation and Language · Computer Science 2023-06-07 Haoxin Li , Phillip Keung , Daniel Cheng , Jungo Kasai , Noah A. Smith

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer…

Computation and Language · Computer Science 2020-02-11 Zhenzhong Lan , Mingda Chen , Sebastian Goodman , Kevin Gimpel , Piyush Sharma , Radu Soricut

Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime with Search

Despite transformers' impressive accuracy, their computational cost is often prohibitive to use with limited computational resources. Most previous approaches to improve inference efficiency require a separate model for each possible…

Computation and Language · Computer Science 2021-06-15 Gyuwan Kim , Kyunghyun Cho

TrimBERT: Tailoring BERT for Trade-offs

Models based on BERT have been extremely successful in solving a variety of natural language processing (NLP) tasks. Unfortunately, many of these large models require a great deal of computational resources and/or time for pre-training and…

Computation and Language · Computer Science 2022-02-28 Sharath Nittur Sridhar , Anthony Sarah , Sairam Sundaresan

TangoBERT: Reducing Inference Cost by using Cascaded Architecture

The remarkable success of large transformer-based models such as BERT, RoBERTa and XLNet in many NLP tasks comes with a large increase in monetary and environmental cost due to their high computational load and energy consumption. In order…

Computation and Language · Computer Science 2022-04-14 Jonathan Mamou , Oren Pereg , Moshe Wasserblat , Roy Schwartz