English
Related papers

Related papers: LightSeq: A High Performance Inference Library for…

200 papers

Transformer-based neural models are used in many AI applications. Training these models is expensive, as it takes huge GPU resources and long duration. It is challenging because typical data like sentences have variable lengths, and…

Computation and Language · Computer Science 2022-06-17 Xiaohui Wang , Yang Wei , Ying Xiong , Guyue Huang , Xian Qian , Yufei Ding , Mingxuan Wang , Lei Li

Transformer-based models have made tremendous impacts in natural language generation. However the inference speed is a bottleneck due to large model size and intensive computing involved in auto-regressive decoding process. We develop…

Computation and Language · Computer Science 2021-07-14 Yu Yan , Fei Hu , Jiusheng Chen , Nikhil Bhendawade , Ting Ye , Yeyun Gong , Nan Duan , Desheng Cui , Bingyu Chi , Ruofei Zhang

The Transformer architecture revolutionized the field of natural language processing (NLP). Transformers-based models (e.g., BERT) power many important Web services, such as search, translation, question-answering, etc. While enormous…

Computation and Language · Computer Science 2021-02-23 Dave Dice , Alex Kogan

Modern deep learning systems like PyTorch and Tensorflow are able to train enormous models with billions (or trillions) of parameters on a distributed infrastructure. These systems require that the internal nodes have the same memory…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-01 Yifan Ding , Nicholas Botzer , Tim Weninger

fairseq is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks. The toolkit is based on PyTorch and…

Computation and Language · Computer Science 2019-04-03 Myle Ott , Sergey Edunov , Alexei Baevski , Angela Fan , Sam Gross , Nathan Ng , David Grangier , Michael Auli

Recently, large-scale transformer-based models have been proven to be effective over various tasks across many domains. Nevertheless, applying them in industrial production requires tedious and heavy works to reduce inference costs. To fill…

Computation and Language · Computer Science 2022-05-25 Gongzheng Li , Yadong Xi , Jingzhen Ding , Duan Wang , Bai Liu , Changjie Fan , Xiaoxi Mao , Zeng Zhao

Transformers have become a predominant machine learning workload, they are not only the de-facto standard for natural language processing tasks, but they are also being deployed in other domains such as vision and speech recognition. Many…

Machine Learning · Computer Science 2022-06-23 Ibrahim Ahmed , Sahil Parmar , Matthew Boyd , Michael Beidler , Kris Kang , Bill Liu , Kyle Roach , John Kim , Dennis Abts

Transformer-based recommender systems, such as BERT4Rec or SASRec, achieve state-of-the-art results in sequential recommendation. However, it is challenging to use these models in production environments with catalogues of millions of…

Information Retrieval · Computer Science 2024-08-20 Aleksandr V. Petrov , Craig Macdonald , Nicola Tonellotto

Transformer models have revolutionized natural language processing, achieving state-of-the-art performance and demonstrating remarkable scalability. However, their memory demands, particularly due to maintaining full context in memory, pose…

Computation and Language · Computer Science 2025-11-04 Juan Gabriel Kostelec , Qinghai Guo

Pre-trained language models like BERT have proven to be highly performant. However, they are often computationally expensive in many practical scenarios, for such heavy models can hardly be readily implemented with limited resources. To…

Computation and Language · Computer Science 2020-04-30 Weijie Liu , Peng Zhou , Zhe Zhao , Zhiruo Wang , Haotang Deng , Qi Ju

Pre-trained language models have shown stellar performance in various downstream tasks. But, this usually comes at the cost of high latency and computation, hindering their usage in resource-limited settings. In this work, we propose a…

Computation and Language · Computer Science 2022-03-18 Ali Modarressi , Hosein Mohebbi , Mohammad Taher Pilehvar

Transformer has been widely used thanks to its ability to capture sequence information in an efficient way. However, recent developments, such as BERT and GPT-2, deliver only heavy architectures with a focus on effectiveness. In this paper,…

Computation and Language · Computer Science 2020-02-17 Chenguang Wang , Zihao Ye , Aston Zhang , Zheng Zhang , Alexander J. Smola

With the rapid development of Natural Language Processing (NLP) technology, the accuracy and efficiency of machine translation have become hot topics of research. This paper proposes a novel Seq2Seq model aimed at improving translation…

Computation and Language · Computer Science 2024-11-01 Yuxu Wu , Yiren Xing

Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer…

Computation and Language · Computer Science 2020-02-11 Zhenzhong Lan , Mingda Chen , Sebastian Goodman , Kevin Gimpel , Piyush Sharma , Radu Soricut

Transformers have become keystone models in natural language processing over the past decade. They have achieved great popularity in deep learning applications, but the increasing sizes of the parameter spaces required by transformer models…

Machine Learning · Computer Science 2023-02-21 Yujia Zhai , Chengquan Jiang , Leyuan Wang , Xiaoying Jia , Shang Zhang , Zizhong Chen , Xin Liu , Yibo Zhu

Pre-trained Transformer-based models have achieved state-of-the-art performance for various Natural Language Processing (NLP) tasks. However, these models often have billions of parameters, and, thus, are too resource-hungry and…

Machine Learning · Computer Science 2021-09-29 Prakhar Ganesh , Yao Chen , Xin Lou , Mohammad Ali Khan , Yin Yang , Hassan Sajjad , Preslav Nakov , Deming Chen , Marianne Winslett

Transformers \citep{vaswani2017attention} have gradually become a key component for many state-of-the-art natural language representation models. A recent Transformer based model- BERT \citep{devlin2018bert} achieved state-of-the-art…

Computation and Language · Computer Science 2020-05-15 Ashish Khetan , Zohar Karnin

Transformer-based models have achieved stateof-the-art results in many tasks in natural language processing. However, such models are usually slow at inference time, making deployment difficult. In this paper, we develop an efficient…

Machine Learning · Computer Science 2020-08-18 Henry Tsai , Jayden Ooi , Chun-Sung Ferng , Hyung Won Chung , Jason Riesa

We present OpenSeq2Seq - a TensorFlow-based toolkit for training sequence-to-sequence models that features distributed and mixed-precision training. Benchmarks on machine translation and speech recognition tasks show that models built using…

Computation and Language · Computer Science 2018-11-22 Oleksii Kuchaiev , Boris Ginsburg , Igor Gitman , Vitaly Lavrukhin , Jason Li , Huyen Nguyen , Carl Case , Paulius Micikevicius

While transformer-based Large Language Models (LLMs) theoretically support massive context windows, they suffer from severe performance degradation when processing long numerical sequences. We attribute this failure to the attention…

Computation and Language · Computer Science 2026-04-10 Jie Sun , Yu Liu , Lu Han , Qiwen Deng , Xiang Shu , Yang Xiao , Xingyu Lu , Jun Zhou , Pengfei Liu , Lintao Ma , Jiancan Wu , Xiang Wang
‹ Prev 1 2 3 10 Next ›