Computation and Language · Computer Science
Towards Fully 8-bit Integer Inference for the Transformer Model
Ye Lin, Yanyang Li, Tengbo Liu, Tong Xiao +2
2020-09-21
Machine Learning · Computer Science
Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model
Aishwarya Bhandare, Vamsi Sripathi, Deepthi Karkada, Vivek Menon +3
2019-06-10
Machine Learning · Computer Science
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Tim Dettmers, Mike Lewis, Younes Belkada, Luke Zettlemoyer
2022-11-11
Machine Learning · Computer Science
FP8 versus INT8 for efficient deep learning inference
Mart van Baalen, Andrey Kuzmin, Suparna S Nair, Yuwei Ren +7
2023-06-16
Machine Learning · Computer Science
Training and inference of large language models using 8-bit floating point
Sergio P. Perez, Yan Zhang, James Briggs, Charlie Blake +5
2023-10-02
Machine Learning · Computer Science
$\mu$nit Scaling: Simple and Scalable FP8 LLM Training
Saaketh Narayan, Abhay Gupta, Mansheej Paul, Davis Blalock
2025-06-06
Machine Learning · Computer Science
Training Transformers with 4-bit Integers
Haocheng Xi, Changhao Li, Jianfei Chen, Jun Zhu
2023-06-26
Artificial Intelligence · Computer Science
FP8-BERT: Post-Training Quantization for Transformer
Jianwei Li, Tianchi Zhang, Ian En-Hsu Yen, Dongkuan Xu
2023-12-13
Computation and Language · Computer Science
I-BERT: Integer-only BERT Quantization
Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney +1
2022-05-02
Computation and Language · Computer Science
Efficient Inference For Neural Machine Translation
Yi-Te Hsu, Sarthak Garg, Yi-Hsiu Liao, Ilya Chatsviorkin
2020-10-08
Computation and Language · Computer Science
Learning Deep Transformer Models for Machine Translation
Qiang Wang, Bei Li, Tong Xiao, Jingbo Zhu +3
2019-06-06
Machine Learning · Computer Science
Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization
Haocheng Xi, Yuxiang Chen, Kang Zhao, Kai Jun Teh +2
2024-07-23
Computation and Language · Computer Science
Very Deep Transformers for Neural Machine Translation
Xiaodong Liu, Kevin Duh, Liyuan Liu, Jianfeng Gao
2020-10-16
Computation and Language · Computer Science
Shallow-to-Deep Training for Neural Machine Translation
Bei Li, Ziyang Wang, Hui Liu, Yufan Jiang +4
2020-10-09
Machine Learning · Computer Science
Accurate INT8 Training Through Dynamic Block-Level Fallback
Pengle Zhang, Jia Wei, Jintao Zhang, Jun Zhu +1
2025-06-10
Computation and Language · Computer Science
Q8BERT: Quantized 8Bit BERT
Ofir Zafrir, Guy Boudoukh, Peter Izsak, Moshe Wasserblat
2021-12-20
Hardware Architecture · Computer Science
Faster Inference of LLMs using FP8 on the Intel Gaudi
Joonhyung Lee, Shmulik Markovich-Golan, Daniel Ohayon, Yair Hanani +8
2025-03-18
Computation and Language · Computer Science
Syntax-Infused Transformer and BERT models for Machine Translation and Natural Language Understanding
Dhanasekar Sundararaman, Vivek Subramanian, Guoyin Wang, Shijing Si +3
2019-11-15
Computation and Language · Computer Science
Scaling Neural Machine Translation
Myle Ott, Sergey Edunov, David Grangier, Michael Auli
2018-09-06
Computation and Language · Computer Science
InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models
Wenjun Wang, Shuo Cai, Congkai Xie, Mingfa Feng +6
2025-10-20
Machine Learning · Computer Science
FP8 Formats for Deep Learning
Paulius Micikevicius, Dusan Stosic, Neil Burgess, Marius Cornea +11
2022-10-03
Machine Learning · Computer Science
A Study of BFLOAT16 for Deep Learning Training
Dhiraj Kalamkar, Dheevatsa Mudigere, Naveen Mellempudi, Dipankar Das +15
2019-06-14
Machine Learning · Computer Science
Towards Efficient Pre-training: Exploring FP4 Precision in Large Language Models
Jiecheng Zhou, Ding Tang, Rong Fu, Boni Hu +7
2025-02-18
Performance · Computer Science
Exploring the Potential of Flexible 8-bit Format: Design and Algorithm
Zhuoyi Zhang, Yunchen Zhang, Gonglei Shi, Yu Shen +5
2023-10-30