English
Related papers

Related papers: Multi-node Bert-pretraining: Cost-efficient Approa…

200 papers

While large language models a la BERT are used ubiquitously in NLP, pretraining them is considered a luxury that only a few well-funded industry labs can afford. How can one train such models with a more modest budget? We present a recipe…

Computation and Language · Computer Science 2021-09-10 Peter Izsak , Moshe Berchansky , Omer Levy

Transformer-based neural models are used in many AI applications. Training these models is expensive, as it takes huge GPU resources and long duration. It is challenging because typical data like sentences have variable lengths, and…

Computation and Language · Computer Science 2022-06-17 Xiaohui Wang , Yang Wei , Ying Xiong , Guyue Huang , Xian Qian , Yufei Ding , Mingxuan Wang , Lei Li

Speeding up the large-scale distributed training is challenging in that it requires improving various components of training including load balancing, communication, optimizers, etc. We present novel approaches for fast large-scale training…

Pre-trained models have demonstrated their effectiveness in many downstream natural language processing (NLP) tasks. The availability of multilingual pre-trained models enables zero-shot transfer of NLP tasks from high resource languages to…

Computation and Language · Computer Science 2020-04-30 Ke Tran

In recent years, researchers tend to pre-train ever-larger language models to explore the upper limit of deep models. However, large language model pre-training costs intensive computational resources and most of the models are trained from…

Computation and Language · Computer Science 2021-10-15 Cheng Chen , Yichun Yin , Lifeng Shang , Xin Jiang , Yujia Qin , Fengyu Wang , Zhi Wang , Xiao Chen , Zhiyuan Liu , Qun Liu

BERT has recently attracted a lot of attention in natural language understanding (NLU) and achieved state-of-the-art results in various NLU tasks. However, its success requires large deep neural networks and huge amount of data, which…

Machine Learning · Computer Science 2020-09-21 Shuai Zheng , Haibin Lin , Sheng Zha , Mu Li

Pre-training models are an important tool in Natural Language Processing (NLP), while the BERT model is a classic pre-training model whose structure has been widely adopted by followers. It was even chosen as the reference model for the…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-08-18 Jinle Zeng , Min Li , Zhihua Wu , Jiaqi Liu , Yuang Liu , Dianhai Yu , Yanjun Ma

Recently, pre-trained Transformer based language models such as BERT and GPT, have shown great improvement in many Natural Language Processing (NLP) tasks. However, these models contain a large amount of parameters. The emergence of even…

Computation and Language · Computer Science 2021-12-20 Ofir Zafrir , Guy Boudoukh , Peter Izsak , Moshe Wasserblat

Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer…

Computation and Language · Computer Science 2020-02-11 Zhenzhong Lan , Mingda Chen , Sebastian Goodman , Kevin Gimpel , Piyush Sharma , Radu Soricut

The practical success of much of NLP depends on the availability of training data. However, in real-world scenarios, training data is often scarce, not least because many application domains are restricted and specific. In this work, we…

Computation and Language · Computer Science 2022-04-01 Marina Sedinkina , Martin Schmitt , Hinrich Schütze

Training large-scale deep learning models has become a key challenge for the scientific community and industry. While the massive use of GPUs can significantly speed up training times, this approach has a negative impact on efficiency. In…

Machine Learning · Computer Science 2025-09-04 David Cortes , Carlos Juiz , Belen Bermejo

Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many prior works aim to improve inference efficiency via compression techniques, e.g., pruning, these works do not explicitly address the…

Computation and Language · Computer Science 2022-08-04 Danilo Vucetic , Mohammadreza Tayaranian , Maryam Ziaeefard , James J. Clark , Brett H. Meyer , Warren J. Gross

Training large deep neural networks on massive datasets is computationally very challenging. There has been recent surge in interest in using large batch stochastic optimization methods to tackle this issue. The most prominent algorithm in…

The fast growing capabilities of large-scale deep learning models, such as Bert, GPT and ViT, are revolutionizing the landscape of NLP, CV and many other domains. Training such models, however, poses an unprecedented demand for computing…

Machine Learning · Computer Science 2024-04-15 Longwei Zou , Han Zhang , Yangdong Deng

Currently, the most widespread neural network architecture for training language models is the so called BERT which led to improvements in various Natural Language Processing (NLP) tasks. In general, the larger the number of parameters in a…

Computation and Language · Computer Science 2021-11-02 Jochen Zöllner , Konrad Sperfeld , Christoph Wick , Roger Labahn

Models based on BERT have been extremely successful in solving a variety of natural language processing (NLP) tasks. Unfortunately, many of these large models require a great deal of computational resources and/or time for pre-training and…

Computation and Language · Computer Science 2022-02-28 Sharath Nittur Sridhar , Anthony Sarah , Sairam Sundaresan

Training deep learning models can be computationally expensive. Prior works have shown that increasing the batch size can potentially lead to better overall throughput. However, the batch size is frequently limited by the accelerator memory…

Machine Learning · Computer Science 2023-01-25 Muralidhar Andoorveedu , Zhanda Zhu , Bojian Zheng , Gennady Pekhimenko

Recent trends in language modeling have focused on increasing performance through scaling, and have resulted in an environment where training language models is out of reach for most researchers and practitioners. While most in the…

Computation and Language · Computer Science 2022-12-29 Jonas Geiping , Tom Goldstein

Language models based on deep neural networks have facilitated great advances in natural language processing and understanding tasks in recent years. While models covering a large number of languages have been introduced, their…

Computation and Language · Computer Science 2020-10-23 Li-Hsin Chang , Sampo Pyysalo , Jenna Kanerva , Filip Ginter

Pre-trained language models, such as BERT, have achieved significant accuracy gain in many natural language processing tasks. Despite its effectiveness, the huge number of parameters makes training a BERT model computationally very…

Computation and Language · Computer Science 2020-11-30 Cheng Yang , Shengnan Wang , Chao Yang , Yuechuan Li , Ru He , Jingqiao Zhang
‹ Prev 1 2 3 10 Next ›