Related papers: EELBERT: Tiny Models through Dynamic Embeddings

DPBERT: Efficient Inference for BERT based on Dynamic Planning

Large-scale pre-trained language models such as BERT have contributed significantly to the development of NLP. However, those models require large computational resources, making it difficult to be applied to mobile devices where computing…

Computation and Language · Computer Science 2023-08-02 Weixin Wu , Hankz Hankui Zhuo

schuBERT: Optimizing Elements of BERT

Transformers \citep{vaswani2017attention} have gradually become a key component for many state-of-the-art natural language representation models. A recent Transformer based model- BERT \citep{devlin2018bert} achieved state-of-the-art…

Computation and Language · Computer Science 2020-05-15 Ashish Khetan , Zohar Karnin

Towards Building Efficient Sentence BERT Models using Layer Pruning

This study examines the effectiveness of layer pruning in creating efficient Sentence BERT (SBERT) models. Our goal is to create smaller sentence embedding models that reduce complexity while maintaining strong embedding similarity. We…

Computation and Language · Computer Science 2024-09-24 Anushka Shelke , Riya Savant , Raviraj Joshi

Extremely Small BERT Models from Mixed-Vocabulary Training

Pretrained language models like BERT have achieved good results on NLP tasks, but are impractical on resource-limited devices due to memory footprint. A large fraction of this footprint comes from the input embeddings with large input…

Computation and Language · Computer Science 2021-02-09 Sanqiang Zhao , Raghav Gupta , Yang Song , Denny Zhou

Compressing Large-Scale Transformer-Based Models: A Case Study on BERT

Pre-trained Transformer-based models have achieved state-of-the-art performance for various Natural Language Processing (NLP) tasks. However, these models often have billions of parameters, and, thus, are too resource-hungry and…

Machine Learning · Computer Science 2021-09-29 Prakhar Ganesh , Yao Chen , Xin Lou , Mohammad Ali Khan , Yin Yang , Hassan Sajjad , Preslav Nakov , Deming Chen , Marianne Winslett

EmbBERT: Attention Under 2 MB Memory

Transformer architectures based on the attention mechanism have revolutionized natural language processing (NLP), driving major breakthroughs across virtually every NLP task. However, their substantial memory and computational requirements…

Computation and Language · Computer Science 2026-03-25 Riccardo Bravin , Massimo Pavan , Hazem Hesham Yousef Shalby , Fabrizio Pittorino , Manuel Roveri

RefBERT: Compressing BERT by Referencing to Pre-computed Representations

Recently developed large pre-trained language models, e.g., BERT, have achieved remarkable performance in many downstream natural language processing applications. These pre-trained language models often contain hundreds of millions of…

Computation and Language · Computer Science 2021-06-17 Xinyi Wang , Haiqin Yang , Liang Zhao , Yang Mo , Jianping Shen

MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers

In this paper, we propose MCUBERT to enable language models like BERT on tiny microcontroller units (MCUs) through network and scheduling co-optimization. We observe the embedding table contributes to the major storage bottleneck for tiny…

Machine Learning · Computer Science 2024-10-24 Zebin Yang , Renze Chen , Taiqiang Wu , Ngai Wong , Yun Liang , Runsheng Wang , Ru Huang , Meng Li

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer…

Computation and Language · Computer Science 2020-02-11 Zhenzhong Lan , Mingda Chen , Sebastian Goodman , Kevin Gimpel , Piyush Sharma , Radu Soricut

BEBERT: Efficient and Robust Binary Ensemble BERT

Pre-trained BERT models have achieved impressive accuracy on natural language processing (NLP) tasks. However, their excessive amount of parameters hinders them from efficient deployment on edge devices. Binarization of the BERT models can…

Computation and Language · Computer Science 2023-05-10 Jiayi Tian , Chao Fang , Haonan Wang , Zhongfeng Wang

iBERT: Interpretable Embeddings via Sense Decomposition

We present iBERT (interpretable-BERT), an encoder to produce inherently interpretable and controllable embeddings - designed to modularize and expose the discriminative cues present in language, such as semantic or stylistic structure. Each…

Computation and Language · Computer Science 2026-01-27 Vishal Anand , Milad Alshomary , Kathleen McKeown

Compressing Transformer-Based Semantic Parsing Models using Compositional Code Embeddings

The current state-of-the-art task-oriented semantic parsing models use BERT or RoBERTa as pretrained encoders; these models have huge memory footprints. This poses a challenge to their deployment for voice assistants such as Amazon Alexa…

Computation and Language · Computer Science 2020-10-13 Prafull Prakash , Saurabh Kumar Shashidhar , Wenlong Zhao , Subendhu Rongali , Haidar Khan , Michael Kayser

Easy and Efficient Transformer : Scalable Inference Solution For large NLP model

Recently, large-scale transformer-based models have been proven to be effective over various tasks across many domains. Nevertheless, applying them in industrial production requires tedious and heavy works to reduce inference costs. To fill…

Computation and Language · Computer Science 2022-05-25 Gongzheng Li , Yadong Xi , Jingzhen Ding , Duan Wang , Bai Liu , Changjie Fan , Xiaoxi Mao , Zeng Zhao

Dynamic-TinyBERT: Boost TinyBERT's Inference Efficiency by Dynamic Sequence Length

Limited computational budgets often prevent transformers from being used in production and from having their high accuracy utilized. TinyBERT addresses the computational efficiency by self-distilling BERT into a smaller transformer…

Computation and Language · Computer Science 2021-11-19 Shira Guskin , Moshe Wasserblat , Ke Ding , Gyuwan Kim

Deriving Contextualised Semantic Features from BERT (and Other Transformer Model) Embeddings

Models based on the transformer architecture, such as BERT, have marked a crucial step forward in the field of Natural Language Processing. Importantly, they allow the creation of word embeddings that capture important semantic information…

Computation and Language · Computer Science 2021-01-01 Jacob Turton , David Vinson , Robert Elliott Smith

Deep Learning Meets Projective Clustering

A common approach for compressing NLP networks is to encode the embedding layer as a matrix $A\in\mathbb{R}^{n\times d}$, compute its rank-$j$ approximation $A_j$ via SVD, and then factor $A_j$ into a pair of matrices that correspond to…

Machine Learning · Computer Science 2020-10-12 Alaa Maalouf , Harry Lang , Daniela Rus , Dan Feldman

Quantized Transformer Language Model Implementations on Edge Devices

Large-scale transformer-based models like the Bidirectional Encoder Representations from Transformers (BERT) are widely used for Natural Language Processing (NLP) applications, wherein these models are initially pre-trained with a large…

Computation and Language · Computer Science 2023-10-09 Mohammad Wali Ur Rahman , Murad Mehrab Abrar , Hunter Gibbons Copening , Salim Hariri , Sicong Shao , Pratik Satam , Soheil Salehi

I-BERT: Integer-only BERT Quantization

Transformer based models, like BERT and RoBERTa, have achieved state-of-the-art results in many Natural Language Processing tasks. However, their memory footprint, inference latency, and power consumption are prohibitive efficient inference…

Computation and Language · Computer Science 2022-05-02 Sehoon Kim , Amir Gholami , Zhewei Yao , Michael W. Mahoney , Kurt Keutzer

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains…

Computation and Language · Computer Science 2020-03-03 Victor Sanh , Lysandre Debut , Julien Chaumond , Thomas Wolf

TrimBERT: Tailoring BERT for Trade-offs

Models based on BERT have been extremely successful in solving a variety of natural language processing (NLP) tasks. Unfortunately, many of these large models require a great deal of computational resources and/or time for pre-training and…

Computation and Language · Computer Science 2022-02-28 Sharath Nittur Sridhar , Anthony Sarah , Sairam Sundaresan