English
Related papers

Related papers: DynaBERT: Dynamic BERT with Adaptive Width and Dep…

200 papers

Large pre-trained language models such as BERT have shown their effectiveness in various natural language processing tasks. However, the huge parameter size makes them difficult to be deployed in real-time applications that require quick…

Computation and Language · Computer Science 2021-01-25 Daoyuan Chen , Yaliang Li , Minghui Qiu , Zhen Wang , Bofang Li , Bolin Ding , Hongbo Deng , Jun Huang , Wei Lin , Jingren Zhou

While pre-trained language models (e.g., BERT) have achieved impressive results on different natural language processing tasks, they have large numbers of parameters and suffer from big computational and memory costs, which make them…

Computation and Language · Computer Science 2021-06-01 Jin Xu , Xu Tan , Renqian Luo , Kaitao Song , Jian Li , Tao Qin , Tie-Yan Liu

BERT is a cutting-edge language representation model pre-trained by a large corpus, which achieves superior performances on various natural language understanding tasks. However, a major blocking issue of applying BERT to online services is…

Computation and Language · Computer Science 2020-10-22 Yihuan Mao , Yujing Wang , Chufan Wu , Chen Zhang , Yang Wang , Yaming Yang , Quanlu Zhang , Yunhai Tong , Jing Bai

Large-scale pre-trained language models such as BERT have contributed significantly to the development of NLP. However, those models require large computational resources, making it difficult to be applied to mobile devices where computing…

Computation and Language · Computer Science 2023-08-02 Weixin Wu , Hankz Hankui Zhuo

Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer…

Computation and Language · Computer Science 2020-02-11 Zhenzhong Lan , Mingda Chen , Sebastian Goodman , Kevin Gimpel , Piyush Sharma , Radu Soricut

Pre-trained Transformer-based models have achieved state-of-the-art performance for various Natural Language Processing (NLP) tasks. However, these models often have billions of parameters, and, thus, are too resource-hungry and…

Machine Learning · Computer Science 2021-09-29 Prakhar Ganesh , Yao Chen , Xin Lou , Mohammad Ali Khan , Yin Yang , Hassan Sajjad , Preslav Nakov , Deming Chen , Marianne Winslett

Transformer-based pre-trained language models such as BERT have achieved remarkable results in Semantic Sentence Matching. However, existing models still suffer from insufficient ability to capture subtle differences. Minor noise like word…

Computation and Language · Computer Science 2023-04-17 Sirui Wang , Di Liang , Jian Song , Yuntao Li , Wei Wu

We present DiffusionBERT, a new generative masked language model based on discrete diffusion models. Diffusion models and many pre-trained language models have a shared training objective, i.e., denoising, making it possible to combine the…

Computation and Language · Computer Science 2022-12-02 Zhengfu He , Tianxiang Sun , Kuanning Wang , Xuanjing Huang , Xipeng Qiu

As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains…

Computation and Language · Computer Science 2020-03-03 Victor Sanh , Lysandre Debut , Julien Chaumond , Thomas Wolf

Large pre-trained language models have recently gained significant traction due to their improved performance on various down-stream tasks like text classification and question answering, requiring only few epochs of fine-tuning. However,…

Computation and Language · Computer Science 2023-09-01 Souvik Kundu , Sharath Nittur Sridhar , Maciej Szankin , Sairam Sundaresan

Transformer-based language models have become a key building block for natural language processing. While these models are extremely accurate, they can be too large and computationally intensive to run on standard deployments. A variety of…

Computation and Language · Computer Science 2022-10-19 Eldar Kurtic , Daniel Campos , Tuan Nguyen , Elias Frantar , Mark Kurtz , Benjamin Fineran , Michael Goin , Dan Alistarh

Limited computational budgets often prevent transformers from being used in production and from having their high accuracy utilized. TinyBERT addresses the computational efficiency by self-distilling BERT into a smaller transformer…

Computation and Language · Computer Science 2021-11-19 Shira Guskin , Moshe Wasserblat , Ke Ding , Gyuwan Kim

The large pre-trained BERT has achieved remarkable performance on Natural Language Processing (NLP) tasks but is also computation and memory expensive. As one of the powerful compression approaches, binarization extremely reduces the…

Computation and Language · Computer Science 2022-03-15 Haotong Qin , Yifu Ding , Mingyuan Zhang , Qinghua Yan , Aishan Liu , Qingqing Dang , Ziwei Liu , Xianglong Liu

Pre-trained language models of the BERT family have defined the state-of-the-arts in a wide range of NLP tasks. However, the performance of BERT-based models is mainly driven by the enormous amount of parameters, which hinders their…

Computation and Language · Computer Science 2021-03-23 Yuanxin Liu , Zheng Lin , Fengcheng Yuan

Transformer-based pre-trained language models, such as BERT, achieve great success in various natural language understanding tasks. Prior research found that BERT captures a rich hierarchy of linguistic information at different layers.…

Computation and Language · Computer Science 2023-07-17 Qian Chen , Wen Wang , Qinglin Zhang , Chong Deng , Ma Yukun , Siqi Zheng

Pre-trained language models like BERT have proven to be highly performant. However, they are often computationally expensive in many practical scenarios, for such heavy models can hardly be readily implemented with limited resources. To…

Computation and Language · Computer Science 2020-04-30 Weijie Liu , Peng Zhou , Zhe Zhao , Zhiruo Wang , Haotang Deng , Qi Ju

Recent innovations in architecture, pre-training, and fine-tuning have led to the remarkable in-context learning and reasoning abilities of large auto-regressive language models such as LLaMA and DeepSeek. In contrast, encoders like BERT…

Computation and Language · Computer Science 2025-06-10 Lola Le Breton , Quentin Fournier , Mariam El Mezouar , John X. Morris , Sarath Chandar

In recent years, BERT has made significant breakthroughs on many natural language processing tasks and attracted great attentions. Despite its accuracy gains, the BERT model generally involves a huge number of parameters and needs to be…

Computation and Language · Computer Science 2021-02-19 Cheng Yang , Shengnan Wang , Yuechuan Li , Chao Yang , Ming Yan , Jingqiao Zhang , Fangquan Lin

Pre-trained BERT models have achieved impressive accuracy on natural language processing (NLP) tasks. However, their excessive amount of parameters hinders them from efficient deployment on edge devices. Binarization of the BERT models can…

Computation and Language · Computer Science 2023-05-10 Jiayi Tian , Chao Fang , Haonan Wang , Zhongfeng Wang

Models based on BERT have been extremely successful in solving a variety of natural language processing (NLP) tasks. Unfortunately, many of these large models require a great deal of computational resources and/or time for pre-training and…

Computation and Language · Computer Science 2022-02-28 Sharath Nittur Sridhar , Anthony Sarah , Sairam Sundaresan
‹ Prev 1 2 3 10 Next ›