Related papers: TR-BERT: Dynamic Token Reduction for Accelerating …

DPBERT: Efficient Inference for BERT based on Dynamic Planning

Large-scale pre-trained language models such as BERT have contributed significantly to the development of NLP. However, those models require large computational resources, making it difficult to be applied to mobile devices where computing…

Computation and Language · Computer Science 2023-08-02 Weixin Wu , Hankz Hankui Zhuo

AdapLeR: Speeding up Inference by Adaptive Length Reduction

Pre-trained language models have shown stellar performance in various downstream tasks. But, this usually comes at the cost of high latency and computation, hindering their usage in resource-limited settings. In this work, we propose a…

Computation and Language · Computer Science 2022-03-18 Ali Modarressi , Hosein Mohebbi , Mohammad Taher Pilehvar

Token Dropping for Efficient BERT Pretraining

Transformer-based models generally allocate the same amount of computation for each token in a given sequence. We develop a simple but effective "token dropping" method to accelerate the pretraining of transformer models, such as BERT,…

Computation and Language · Computer Science 2022-03-25 Le Hou , Richard Yuanzhe Pang , Tianyi Zhou , Yuexin Wu , Xinying Song , Xiaodan Song , Denny Zhou

Dynamic-TinyBERT: Boost TinyBERT's Inference Efficiency by Dynamic Sequence Length

Limited computational budgets often prevent transformers from being used in production and from having their high accuracy utilized. TinyBERT addresses the computational efficiency by self-distilling BERT into a smaller transformer…

Computation and Language · Computer Science 2021-11-19 Shira Guskin , Moshe Wasserblat , Ke Ding , Gyuwan Kim

Accelerating BERT Inference for Sequence Labeling via Early-Exit

Both performance and efficiency are crucial factors for sequence labeling tasks in many real-world scenarios. Although the pre-trained models (PTMs) have significantly improved the performance of various sequence labeling tasks, their…

Computation and Language · Computer Science 2021-06-15 Xiaonan Li , Yunfan Shao , Tianxiang Sun , Hang Yan , Xipeng Qiu , Xuanjing Huang

DACT-BERT: Differentiable Adaptive Computation Time for an Efficient BERT Inference

Large-scale pre-trained language models have shown remarkable results in diverse NLP applications. Unfortunately, these performance gains have been accompanied by a significant increase in computation time and model size, stressing the need…

Computation and Language · Computer Science 2021-09-27 Cristóbal Eyzaguirre , Felipe del Río , Vladimir Araujo , Álvaro Soto

Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection

Transformer-based language models such as BERT have achieved the state-of-the-art performance on various NLP tasks, but are computationally prohibitive. A recent line of works use various heuristics to successively shorten sequence length…

Computation and Language · Computer Science 2022-03-29 Xin Huang , Ashish Khetan , Rene Bidart , Zohar Karnin

Thinking Augmented Pre-training

This paper introduces a simple and scalable approach to improve the data efficiency of large language model (LLM) training by augmenting existing text data with thinking trajectories. The compute for pre-training LLMs has been growing at an…

Computation and Language · Computer Science 2025-10-20 Liang Wang , Nan Yang , Shaohan Huang , Li Dong , Furu Wei

Focus on the Core: Efficient Attention via Pruned Token Compression for Document Classification

Transformer-based models have achieved dominant performance in numerous NLP tasks. Despite their remarkable successes, pre-trained transformers such as BERT suffer from a computationally expensive self-attention mechanism that interacts…

Computation and Language · Computer Science 2024-06-04 Jungmin Yun , Mihyeon Kim , Youngbin Kim

DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

Large-scale pre-trained language models such as BERT have brought significant improvements to NLP applications. However, they are also notorious for being slow in inference, which makes them difficult to deploy in real-time applications. We…

Computation and Language · Computer Science 2020-04-28 Ji Xin , Raphael Tang , Jaejun Lee , Yaoliang Yu , Jimmy Lin

Learning Dynamic BERT via Trainable Gate Variables and a Bi-modal Regularizer

The BERT model has shown significant success on various natural language processing tasks. However, due to the heavy model size and high computational cost, the model suffers from high latency, which is fatal to its deployments on…

Computation and Language · Computer Science 2021-02-22 Seohyeong Jeong , Nojun Kwak

TinyBERT: Distilling BERT for Natural Language Understanding

Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, pre-trained language models are usually computationally expensive, so it is difficult to efficiently…

Computation and Language · Computer Science 2020-10-19 Xiaoqi Jiao , Yichun Yin , Lifeng Shang , Xin Jiang , Xiao Chen , Linlin Li , Fang Wang , Qun Liu

PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination

We develop a novel method, called PoWER-BERT, for improving the inference time of the popular BERT model, while maintaining the accuracy. It works by: a) exploiting redundancy pertaining to word-vectors (intermediate encoder outputs) and…

Machine Learning · Computer Science 2020-09-09 Saurabh Goyal , Anamitra R. Choudhury , Saurabh M. Raje , Venkatesan T. Chakaravarthy , Yogish Sabharwal , Ashish Verma

FastBERT: a Self-distilling BERT with Adaptive Inference Time

Pre-trained language models like BERT have proven to be highly performant. However, they are often computationally expensive in many practical scenarios, for such heavy models can hardly be readily implemented with limited resources. To…

Computation and Language · Computer Science 2020-04-30 Weijie Liu , Peng Zhou , Zhe Zhao , Zhiruo Wang , Haotang Deng , Qi Ju

BERT Loses Patience: Fast and Robust Inference with Early Exit

In this paper, we propose Patience-based Early Exit, a straightforward yet effective inference method that can be used as a plug-and-play technique to simultaneously improve the efficiency and robustness of a pretrained language model…

Computation and Language · Computer Science 2020-10-23 Wangchunshu Zhou , Canwen Xu , Tao Ge , Julian McAuley , Ke Xu , Furu Wei

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer…

Computation and Language · Computer Science 2020-02-11 Zhenzhong Lan , Mingda Chen , Sebastian Goodman , Kevin Gimpel , Piyush Sharma , Radu Soricut

Breaking the Token Barrier: Chunking and Convolution for Efficient Long Text Classification with BERT

Transformer-based models, specifically BERT, have propelled research in various NLP tasks. However, these models are limited to a maximum token limit of 512 tokens. Consequently, this makes it non-trivial to apply it in a practical setting…

Computation and Language · Computer Science 2023-11-01 Aman Jaiswal , Evangelos Milios

FastMTP: Accelerating LLM Inference with Enhanced Multi-Token Prediction

As large language models (LLMs) become increasingly powerful, the sequential nature of autoregressive generation creates a fundamental throughput bottleneck that limits the practical deployment. While Multi-Token Prediction (MTP) has…

Machine Learning · Computer Science 2025-09-24 Yuxuan Cai , Xiaozhuan Liang , Xinghua Wang , Jin Ma , Haijin Liang , Jinwen Luo , Xinyu Zuo , Lisheng Duan , Yuyang Yin , Xi Chen

EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets

Heavily overparameterized language models such as BERT, XLNet and T5 have achieved impressive success in many NLP tasks. However, their high model complexity requires enormous computation resources and extremely long training time for both…

Computation and Language · Computer Science 2021-06-09 Xiaohan Chen , Yu Cheng , Shuohang Wang , Zhe Gan , Zhangyang Wang , Jingjing Liu

IM-BERT: Enhancing Robustness of BERT through the Implicit Euler Method

Pre-trained Language Models (PLMs) have achieved remarkable performance on diverse NLP tasks through pre-training and fine-tuning. However, fine-tuning the model with a large number of parameters on limited downstream datasets often leads…

Computation and Language · Computer Science 2025-05-13 Mihyeon Kim , Juhyoung Park , Youngbin Kim