English
Related papers

Related papers: Efficient GPT Model Pre-training using Tensor Trai…

200 papers

High-dimensional token embeddings underpin Large Language Models (LLMs), as they can capture subtle semantic information and significantly enhance the modelling of complex language patterns. However, this high dimensionality also introduces…

Computation and Language · Computer Science 2024-10-07 Mingxue Xu , Yao Lei Xu , Danilo P. Mandic

In recent years, researchers tend to pre-train ever-larger language models to explore the upper limit of deep models. However, large language model pre-training costs intensive computational resources and most of the models are trained from…

Computation and Language · Computer Science 2021-10-15 Cheng Chen , Yichun Yin , Lifeng Shang , Xin Jiang , Yujia Qin , Fengyu Wang , Zhi Wang , Xiao Chen , Zhiyuan Liu , Qun Liu

The embedding layers transforming input words into real vectors are the key components of deep neural networks used in natural language processing. However, when the vocabulary is large, the corresponding weight matrices can be enormous,…

Computation and Language · Computer Science 2020-02-20 Oleksii Hrinchuk , Valentin Khrulkov , Leyla Mirvakhabova , Elena Orlova , Ivan Oseledets

Recent work in language modeling demonstrates that training large transformer models advances the state of the art in Natural Language Processing applications. However, very large models can be quite difficult to train due to memory…

Computation and Language · Computer Science 2020-03-17 Mohammad Shoeybi , Mostofa Patwary , Raul Puri , Patrick LeGresley , Jared Casper , Bryan Catanzaro

Recent work explored the potential of large-scale Transformer-based pre-trained models, especially Pre-trained Language Models (PLMs) in natural language processing. This raises many concerns from various perspectives, e.g., financial costs…

Computation and Language · Computer Science 2022-05-23 Yuxin Ren , Benyou Wang , Lifeng Shang , Xin Jiang , Qun Liu

We introduce TQCompressor, a novel method for neural network model compression with improved tensor decompositions. We explore the challenges posed by the computational and storage demands of pre-trained language models in NLP tasks and…

Machine Learning · Computer Science 2024-01-30 V. Abronin , A. Naumov , D. Mazur , D. Bystrov , K. Tsarova , Ar. Melnikov , I. Oseledets , S. Dolgov , R. Brasher , M. Perelshtein

Pre-trained language models have recently emerged as a powerful tool for fine-tuning a variety of language tasks. Ideally, when models are pre-trained on large amount of data, they are expected to gain implicit knowledge. In this paper, we…

Computation and Language · Computer Science 2023-06-22 Mohamad Ballout , Ulf Krumnack , Gunther Heidemann , Kai-Uwe Kühnberger

This article describes our experiments in neural machine translation using the recent Tensor2Tensor framework and the Transformer sequence-to-sequence model (Vaswani et al., 2017). We examine some of the critical parameters that affect the…

Computation and Language · Computer Science 2018-05-03 Martin Popel , Ondřej Bojar

Embedding layers in transformer-based NLP models typically account for the largest share of model parameters, scaling with vocabulary size but not yielding performance gains proportional to scale. We propose an alternative approach in which…

Computation and Language · Computer Science 2025-05-06 Henry Ndubuaku , Mouad Talhi

Large Transformer-based language models are pre-trained on corpora of varying sizes, for a different number of steps and with different batch sizes. At the same time, more fundamental components, such as the pre-training objective or…

Computation and Language · Computer Science 2021-05-12 M. Aßenmacher , P. Schulze , C. Heumann

Fine-tuning a pretrained transformer for a downstream task has become a standard method in NLP in the last few years. While the results from these models are impressive, applying them can be extremely computationally expensive, as is…

Computation and Language · Computer Science 2020-08-18 Davis Yoshida , Allyson Ettinger , Kevin Gimpel

The rapid advancement in Large Language Models has been met with significant challenges in their training processes, primarily due to their considerable computational and memory demands. This research examines parallelization techniques…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-27 Ishan Patwardhan , Shubham Gandhi , Om Khare , Amit Joshi , Suraj Sawant

Transformer-based language models create hidden representations of their inputs at every layer, but only use final-layer representations for prediction. This obscures the internal decision-making process of the model and the utility of its…

Computation and Language · Computer Science 2024-06-21 Alexander Yom Din , Taelin Karidi , Leshem Choshen , Mor Geva

In this paper, we propose a highly parameter-efficient approach to scaling pre-trained language models (PLMs) to a deeper model depth. Unlike prior work that shares all parameters or uses extra blocks, we design a more capable…

Computation and Language · Computer Science 2023-04-12 Peiyu Liu , Ze-Feng Gao , Yushuo Chen , Wayne Xin Zhao , Ji-Rong Wen

Large Transformer models have been central to recent advances in natural language processing. The training and inference costs of these models, however, have grown rapidly and become prohibitively expensive. Here we aim to reduce the costs…

Machine Learning · Computer Science 2022-01-26 David R. So , Wojciech Mańke , Hanxiao Liu , Zihang Dai , Noam Shazeer , Quoc V. Le

GPT is an auto-regressive Transformer-based pre-trained language model which has attracted a lot of attention in the natural language processing (NLP) domain due to its state-of-the-art performance in several downstream tasks. The success…

Computation and Language · Computer Science 2021-10-18 Ali Edalati , Marzieh Tahaei , Ahmad Rashid , Vahid Partovi Nia , James J. Clark , Mehdi Rezagholizadeh

Deep learning has contributed remarkably to the advancement of time series analysis. Still, deep models can encounter performance bottlenecks in real-world data-scarce scenarios, which can be concealed due to the performance saturation with…

Machine Learning · Computer Science 2024-10-21 Yong Liu , Haoran Zhang , Chenyu Li , Xiangdong Huang , Jianmin Wang , Mingsheng Long

Large language models like GPT-4 exhibit emergent capabilities across general-purpose tasks, such as basic arithmetic, when trained on extensive text data, even though these tasks are not explicitly encoded by the unsupervised, next-token…

Machine Learning · Computer Science 2023-07-10 Nayoung Lee , Kartik Sreenivasan , Jason D. Lee , Kangwook Lee , Dimitris Papailiopoulos

In recent years, Long Short-Term Memory (LSTM) has become a popular choice for speech separation and speech enhancement task. The capability of LSTM network can be enhanced by widening and adding more layers. However, this would introduce…

Sound · Computer Science 2018-12-27 Suman Samui , Indrajit Chakrabarti , Soumya K. Ghosh

Large-scale Transformer models have significantly promoted the recent development of natural language processing applications. However, little effort has been made to unify the effective models. In this paper, driven by providing a new set…

Computation and Language · Computer Science 2022-04-12 Dezhou Shen
‹ Prev 1 2 3 10 Next ›