Related papers: BEBERT: Efficient and Robust Binary Ensemble BERT

BiBERT: Accurate Fully Binarized BERT

The large pre-trained BERT has achieved remarkable performance on Natural Language Processing (NLP) tasks but is also computation and memory expensive. As one of the powerful compression approaches, binarization extremely reduces the…

Computation and Language · Computer Science 2022-03-15 Haotong Qin , Yifu Ding , Mingyuan Zhang , Qinghua Yan , Aishan Liu , Qingqing Dang , Ziwei Liu , Xianglong Liu

TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval

Pre-trained language models like BERT have achieved great success in a wide variety of NLP tasks, while the superior performance comes with high demand in computational resources, which hinders the application in low-latency IR systems. We…

Information Retrieval · Computer Science 2020-02-18 Wenhao Lu , Jian Jiao , Ruofei Zhang

BinaryBERT: Pushing the Limit of BERT Quantization

The rapid development of large pre-trained language models has greatly increased the demand for model compression techniques, among which quantization is a popular solution. In this paper, we propose BinaryBERT, which pushes BERT…

Computation and Language · Computer Science 2021-07-23 Haoli Bai , Wei Zhang , Lu Hou , Lifeng Shang , Jing Jin , Xin Jiang , Qun Liu , Michael Lyu , Irwin King

MedicalBERT: enhancing biomedical natural language processing using pretrained BERT-based model

Recent advances in natural language processing (NLP) have been driven bypretrained language models like BERT, RoBERTa, T5, and GPT. Thesemodels excel at understanding complex texts, but biomedical literature, withits domain-specific…

Computation and Language · Computer Science 2025-07-28 K. Sahit Reddy , N. Ragavenderan , Vasanth K. , Ganesh N. Naik , Vishalakshi Prabhu , Nagaraja G. S

Efficient Fine-Tuning of Compressed Language Models with Learners

Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many prior works aim to improve inference efficiency via compression techniques, e.g., pruning, these works do not explicitly address the…

Computation and Language · Computer Science 2022-08-04 Danilo Vucetic , Mohammadreza Tayaranian , Maryam Ziaeefard , James J. Clark , Brett H. Meyer , Warren J. Gross

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer…

Computation and Language · Computer Science 2020-02-11 Zhenzhong Lan , Mingda Chen , Sebastian Goodman , Kevin Gimpel , Piyush Sharma , Radu Soricut

TernaryBERT: Distillation-aware Ultra-low Bit BERT

Transformer-based pre-training models like BERT have achieved remarkable performance in many natural language processing tasks.However, these models are both computation and memory expensive, hindering their deployment to…

Computation and Language · Computer Science 2020-10-13 Wei Zhang , Lu Hou , Yichun Yin , Lifeng Shang , Xiao Chen , Xin Jiang , Qun Liu

Elbert: Fast Albert with Confidence-Window Based Early Exit

Despite the great success in Natural Language Processing (NLP) area, large pre-trained language models like BERT are not well-suited for resource-constrained or real-time applications owing to the large number of parameters and slow…

Computation and Language · Computer Science 2021-07-02 Keli Xie , Siyuan Lu , Meiqi Wang , Zhongfeng Wang

BiT: Robustly Binarized Multi-distilled Transformer

Modern pre-trained transformers have rapidly advanced the state-of-the-art in machine learning, but have also grown in parameters and computational complexity, making them increasingly difficult to deploy in resource-constrained…

Machine Learning · Computer Science 2022-10-04 Zechun Liu , Barlas Oguz , Aasish Pappu , Lin Xiao , Scott Yih , Meng Li , Raghuraman Krishnamoorthi , Yashar Mehdad

Unitary Multi-Margin BERT for Robust Natural Language Processing

Recent developments in adversarial attacks on deep learning leave many mission-critical natural language processing (NLP) systems at risk of exploitation. To address the lack of computationally efficient adversarial defense methods, this…

Computation and Language · Computer Science 2024-10-17 Hao-Yuan Chang , Kang L. Wang

Incorporating BERT into Neural Machine Translation

The recently proposed BERT has shown great power on a variety of natural language understanding tasks, such as text classification, reading comprehension, etc. However, how to effectively apply BERT to neural machine translation (NMT) lacks…

Computation and Language · Computer Science 2020-02-18 Jinhua Zhu , Yingce Xia , Lijun Wu , Di He , Tao Qin , Wengang Zhou , Houqiang Li , Tie-Yan Liu

Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation

Fine-tuning pre-trained language models like BERT has become an effective way in NLP and yields state-of-the-art results on many downstream tasks. Recent studies on adapting BERT to new tasks mainly focus on modifying the model structure,…

Computation and Language · Computer Science 2020-02-25 Yige Xu , Xipeng Qiu , Ligao Zhou , Xuanjing Huang

RefBERT: Compressing BERT by Referencing to Pre-computed Representations

Recently developed large pre-trained language models, e.g., BERT, have achieved remarkable performance in many downstream natural language processing applications. These pre-trained language models often contain hundreds of millions of…

Computation and Language · Computer Science 2021-06-17 Xinyi Wang , Haiqin Yang , Liang Zhao , Yang Mo , Jianping Shen

NeoBERT: A Next-Generation BERT

Recent innovations in architecture, pre-training, and fine-tuning have led to the remarkable in-context learning and reasoning abilities of large auto-regressive language models such as LLaMA and DeepSeek. In contrast, encoders like BERT…

Computation and Language · Computer Science 2025-06-10 Lola Le Breton , Quentin Fournier , Mariam El Mezouar , John X. Morris , Sarath Chandar

BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation

The success of bidirectional encoders using masked language models, such as BERT, on numerous natural language processing tasks has prompted researchers to attempt to incorporate these pre-trained models into neural machine translation…

Computation and Language · Computer Science 2021-09-13 Haoran Xu , Benjamin Van Durme , Kenton Murray

MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices

Natural Language Processing (NLP) has recently achieved great success by using huge pre-trained models with hundreds of millions of parameters. However, these models suffer from heavy model sizes and high latency such that they cannot be…

Computation and Language · Computer Science 2020-04-16 Zhiqing Sun , Hongkun Yu , Xiaodan Song , Renjie Liu , Yiming Yang , Denny Zhou

DPBERT: Efficient Inference for BERT based on Dynamic Planning

Large-scale pre-trained language models such as BERT have contributed significantly to the development of NLP. However, those models require large computational resources, making it difficult to be applied to mobile devices where computing…

Computation and Language · Computer Science 2023-08-02 Weixin Wu , Hankz Hankui Zhuo

schuBERT: Optimizing Elements of BERT

Transformers \citep{vaswani2017attention} have gradually become a key component for many state-of-the-art natural language representation models. A recent Transformer based model- BERT \citep{devlin2018bert} achieved state-of-the-art…

Computation and Language · Computer Science 2020-05-15 Ashish Khetan , Zohar Karnin

CoRe: An Efficient Coarse-refined Training Framework for BERT

In recent years, BERT has made significant breakthroughs on many natural language processing tasks and attracted great attentions. Despite its accuracy gains, the BERT model generally involves a huge number of parameters and needs to be…

Computation and Language · Computer Science 2021-02-19 Cheng Yang , Shengnan Wang , Yuechuan Li , Chao Yang , Ming Yan , Jingqiao Zhang , Fangquan Lin

MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation

Pre-trained language models have demonstrated superior performance in various natural language processing tasks. However, these models usually contain hundreds of millions of parameters, which limits their practicality because of latency…

Computation and Language · Computer Science 2022-05-02 Simiao Zuo , Qingru Zhang , Chen Liang , Pengcheng He , Tuo Zhao , Weizhu Chen