English
Related papers

Related papers: schuBERT: Optimizing Elements of BERT

200 papers

Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer…

Computation and Language · Computer Science 2020-02-11 Zhenzhong Lan , Mingda Chen , Sebastian Goodman , Kevin Gimpel , Piyush Sharma , Radu Soricut

Transformer has been widely used thanks to its ability to capture sequence information in an efficient way. However, recent developments, such as BERT and GPT-2, deliver only heavy architectures with a focus on effectiveness. In this paper,…

Computation and Language · Computer Science 2020-02-17 Chenguang Wang , Zihao Ye , Aston Zhang , Zheng Zhang , Alexander J. Smola

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional…

Computation and Language · Computer Science 2019-05-28 Jacob Devlin , Ming-Wei Chang , Kenton Lee , Kristina Toutanova

Transformer based architectures have become de-facto models used for a range of Natural Language Processing tasks. In particular, the BERT based models achieved significant accuracy gain for GLUE tasks, CoNLL-03 and SQuAD. However, BERT…

Computation and Language · Computer Science 2021-04-21 Sheng Shen , Zhen Dong , Jiayu Ye , Linjian Ma , Zhewei Yao , Amir Gholami , Michael W. Mahoney , Kurt Keutzer

Pre-trained Transformer-based models have achieved state-of-the-art performance for various Natural Language Processing (NLP) tasks. However, these models often have billions of parameters, and, thus, are too resource-hungry and…

Machine Learning · Computer Science 2021-09-29 Prakhar Ganesh , Yao Chen , Xin Lou , Mohammad Ali Khan , Yin Yang , Hassan Sajjad , Preslav Nakov , Deming Chen , Marianne Winslett

Attention based language models have become a critical component in state-of-the-art natural language processing systems. However, these models have significant computational requirements, due to long training times, dense operations and…

Computation and Language · Computer Science 2021-06-11 Ivan Chelombiev , Daniel Justus , Douglas Orr , Anastasia Dietrich , Frithjof Gressmann , Alexandros Koliousis , Carlo Luschi

Self-supervised speech representation learning has shown promising results in various speech processing tasks. However, the pre-trained models, e.g., HuBERT, are storage-intensive Transformers, limiting their scope of applications under…

Audio and Speech Processing · Electrical Eng. & Systems 2022-06-22 Rui Wang , Qibing Bai , Junyi Ao , Long Zhou , Zhixiang Xiong , Zhihua Wei , Yu Zhang , Tom Ko , Haizhou Li

Machine question answering is an essential yet challenging task in natural language processing. Recently, Pre-trained Contextual Embeddings (PCE) models like Bidirectional Encoder Representations from Transformers (BERT) and A Lite BERT…

Computation and Language · Computer Science 2021-10-20 Shilun Li , Renee Li , Veronica Peng

Recent work explored the potential of large-scale Transformer-based pre-trained models, especially Pre-trained Language Models (PLMs) in natural language processing. This raises many concerns from various perspectives, e.g., financial costs…

Computation and Language · Computer Science 2022-05-23 Yuxin Ren , Benyou Wang , Lifeng Shang , Xin Jiang , Qun Liu

We introduce EELBERT, an approach for compression of transformer-based models (e.g., BERT), with minimal impact on the accuracy of downstream tasks. This is achieved by replacing the input embedding layer of the model with dynamic, i.e.…

Computation and Language · Computer Science 2023-11-01 Gabrielle Cohn , Rishika Agarwal , Deepanshu Gupta , Siddharth Patwardhan

Recently, the pre-trained language model, BERT (and its robustly optimized version RoBERTa), has attracted a lot of attention in natural language understanding (NLU), and achieved state-of-the-art accuracy in various NLU tasks, such as…

Computation and Language · Computer Science 2019-09-30 Wei Wang , Bin Bi , Ming Yan , Chen Wu , Zuyi Bao , Jiangnan Xia , Liwei Peng , Luo Si

Transformer-based pre-training models like BERT have achieved remarkable performance in many natural language processing tasks.However, these models are both computation and memory expensive, hindering their deployment to…

Computation and Language · Computer Science 2020-10-13 Wei Zhang , Lu Hou , Yichun Yin , Lifeng Shang , Xiao Chen , Xin Jiang , Qun Liu

Transformer-based language models have become a key building block for natural language processing. While these models are extremely accurate, they can be too large and computationally intensive to run on standard deployments. A variety of…

Computation and Language · Computer Science 2022-10-19 Eldar Kurtic , Daniel Campos , Tuan Nguyen , Elias Frantar , Mark Kurtz , Benjamin Fineran , Michael Goin , Dan Alistarh

Pre-trained BERT models have achieved impressive accuracy on natural language processing (NLP) tasks. However, their excessive amount of parameters hinders them from efficient deployment on edge devices. Binarization of the BERT models can…

Computation and Language · Computer Science 2023-05-10 Jiayi Tian , Chao Fang , Haonan Wang , Zhongfeng Wang

Transformer-based models are widely used in natural language understanding (NLU) tasks, and multimodal transformers have been effective in visual-language tasks. This study explores distilling visual information from pretrained multimodal…

Computation and Language · Computer Science 2022-05-04 Chan-Jan Hsu , Hung-yi Lee , Yu Tsao

Large-scale pre-trained language models such as BERT have contributed significantly to the development of NLP. However, those models require large computational resources, making it difficult to be applied to mobile devices where computing…

Computation and Language · Computer Science 2023-08-02 Weixin Wu , Hankz Hankui Zhuo

Transformer-based language models are applied to a wide range of applications in natural language processing. However, they are inefficient and difficult to deploy. In recent years, many compression algorithms have been proposed to increase…

Computation and Language · Computer Science 2021-11-11 Ofir Zafrir , Ariel Larey , Guy Boudoukh , Haihao Shen , Moshe Wasserblat

Natural Language Processing (NLP) has witnessed a transformative leap with the advent of transformer-based architectures, which have significantly enhanced the ability of machines to understand and generate human-like text. This paper…

Computation and Language · Computer Science 2025-03-27 Tianhao Wu , Yu Wang , Ngoc Quach

Transformer based models, like BERT and RoBERTa, have achieved state-of-the-art results in many Natural Language Processing tasks. However, their memory footprint, inference latency, and power consumption are prohibitive efficient inference…

Computation and Language · Computer Science 2022-05-02 Sehoon Kim , Amir Gholami , Zhewei Yao , Michael W. Mahoney , Kurt Keutzer

Recently, leveraging pre-trained Transformer based language models in down stream, task specific models has advanced state of the art results in natural language understanding tasks. However, only a little research has explored the…

Computation and Language · Computer Science 2020-12-07 Daniel Grießhaber , Johannes Maucher , Ngoc Thang Vu
‹ Prev 1 2 3 10 Next ›