Related papers: Revisiting BFloat16 Training

Revisiting 16-bit Neural Network Training: A Practical Approach for Resource-Limited Learning

With the increasing complexity of machine learning models, managing computational resources like memory and processing power has become a critical concern. Mixed precision techniques, which leverage different numerical precisions during…

Machine Learning · Computer Science 2026-04-20 Juyoung Yun , Sol Choi , Francois Rameau , Byungkon Kang , Zhoulai Fu

Training Deep Neural Networks with 8-bit Floating Point Numbers

The state-of-the-art hardware platforms for training Deep Neural Networks (DNNs) are moving from traditional single precision (32-bit) computations towards 16 bits of precision -- in large part due to the high energy efficiency and smaller…

Machine Learning · Computer Science 2018-12-20 Naigang Wang , Jungwook Choi , Daniel Brand , Chia-Yu Chen , Kailash Gopalakrishnan

Mixed Precision Training With 8-bit Floating Point

Reduced precision computation for deep neural networks is one of the key areas addressing the widening compute gap driven by an exponential growth in model size. In recent years, deep learning training has largely migrated to 16-bit…

Machine Learning · Computer Science 2019-05-30 Naveen Mellempudi , Sudarshan Srinivasan , Dipankar Das , Bharat Kaul

A Study of BFLOAT16 for Deep Learning Training

This paper presents the first comprehensive empirical study demonstrating the efficacy of the Brain Floating Point (BFLOAT16) half-precision format for Deep Learning training across image classification, speech recognition, language…

Machine Learning · Computer Science 2019-06-14 Dhiraj Kalamkar , Dheevatsa Mudigere , Naveen Mellempudi , Dipankar Das , Kunal Banerjee , Sasikanth Avancha , Dharma Teja Vooturi , Nataraj Jammalamadaka , Jianyu Huang , Hector Yuen , Jiyan Yang , Jongsoo Park , Alexander Heinecke , Evangelos Georganas , Sudarshan Srinivasan , Abhisek Kundu , Misha Smelyanskiy , Bharat Kaul , Pradeep Dubey

Continuous 16-bit Training: Accelerating 32-bit Pre-Trained Neural Networks

In the field of deep learning, the prevalence of models initially trained with 32-bit precision is a testament to its robustness and accuracy. However, the continuous evolution of these models often demands further training, which can be…

Machine Learning · Computer Science 2023-12-04 Juyoung Yun

Deep Learning with Limited Numerical Precision

Training of large-scale deep neural networks is often constrained by the available computational resources. We study the effect of limited precision data representation and computation on neural network training. Within the context of…

Machine Learning · Computer Science 2015-02-11 Suyog Gupta , Ankur Agrawal , Kailash Gopalakrishnan , Pritish Narayanan

To FP8 and Back Again: Quantifying Reduced Precision Effects on LLM Training Stability

The massive computational costs associated with large language model (LLM) pretraining have spurred great interest in reduced-precision floating-point representations to accelerate the process. As a result, the BrainFloat16 (BF16) precision…

Machine Learning · Computer Science 2025-03-26 Joonhyung Lee , Jeongin Bae , Byeongwook Kim , Se Jung Kwon , Dongsoo Lee

The Hidden Power of Pure 16-bit Floating-Point Neural Networks

Lowering the precision of neural networks from the prevalent 32-bit precision has long been considered harmful to performance, despite the gain in space and time. Many works propose various techniques to implement half-precision neural…

Machine Learning · Computer Science 2024-05-06 Juyoung Yun , Byungkon Kang , Zhoulai Fu

Low-Precision Floating-Point Schemes for Neural Network Training

The use of low-precision fixed-point arithmetic along with stochastic rounding has been proposed as a promising alternative to the commonly used 32-bit floating point arithmetic to enhance training neural networks training in terms of…

Machine Learning · Computer Science 2018-04-17 Marc Ortiz , Adrián Cristal , Eduard Ayguadé , Marc Casas

Leveraging the bfloat16 Artificial Intelligence Datatype For Higher-Precision Computations

In recent years fused-multiply-add (FMA) units with lower-precision multiplications and higher-precision accumulation have proven useful in machine learning/artificial intelligence applications, most notably in training deep neural networks…

Mathematical Software · Computer Science 2019-04-16 Greg Henry , Ping Tak Peter Tang , Alexander Heinecke

Mixed Precision Training

Deep neural networks have enabled progress in a wide variety of applications. Growing the size of the neural network typically results in improved accuracy. As model sizes grow, the memory and compute requirements for training these models…

Artificial Intelligence · Computer Science 2018-02-19 Paulius Micikevicius , Sharan Narang , Jonah Alben , Gregory Diamos , Erich Elsen , David Garcia , Boris Ginsburg , Michael Houston , Oleksii Kuchaiev , Ganesh Venkatesh , Hao Wu

PositNN: Training Deep Neural Networks with Mixed Low-Precision Posit

Low-precision formats have proven to be an efficient way to reduce not only the memory footprint but also the hardware resources and power consumption of deep learning computations. Under this premise, the posit numerical format appears to…

Machine Learning · Computer Science 2021-05-17 Gonçalo Raposo , Pedro Tomás , Nuno Roma

Unit Scaling: Out-of-the-Box Low-Precision Training

We present unit scaling, a paradigm for designing deep learning models that simplifies the use of low-precision number formats. Training in FP16 or the recently proposed FP8 formats offers substantial efficiency gains, but can lack…

Machine Learning · Computer Science 2023-06-01 Charlie Blake , Douglas Orr , Carlo Luschi

Towards Federated Learning with On-device Training and Communication in 8-bit Floating Point

Recent work has shown that 8-bit floating point (FP8) can be used for efficiently training neural networks with reduced computational cost compared to training in FP32/FP16. In this work, we investigate the use of FP8 training in a…

Machine Learning · Computer Science 2025-07-31 Bokun Wang , Axel Berg , Durmus Alp Emre Acar , Chuteng Zhou

Floating-Point Multiply-Add with Approximate Normalization for Low-Cost Matrix Engines

The widespread adoption of machine learning algorithms necessitates hardware acceleration to ensure efficient performance. This acceleration relies on custom matrix engines that operate on full or reduced-precision floating-point…

Hardware Architecture · Computer Science 2024-08-23 Kosmas Alexandridis , Christodoulos Peltekis , Dionysios Filippas , Giorgos Dimitrakopoulos

Distributed Low Precision Training Without Mixed Precision

Low precision training is one of the most popular strategies for deploying the deep model on limited hardware resources. Fixed point implementation of DCNs has the potential to alleviate complexities and facilitate potential deployment on…

Computer Vision and Pattern Recognition · Computer Science 2019-12-30 Zehua Cheng , Weiyang Wang , Yan Pan , Thomas Lukasiewicz

Defeating the Training-Inference Mismatch via FP16

Reinforcement learning (RL) fine-tuning of large language models (LLMs) often suffers from instability due to the numerical mismatch between the training and inference policies. While prior work has attempted to mitigate this issue through…

Machine Learning · Computer Science 2025-10-31 Penghui Qi , Zichen Liu , Xiangxin Zhou , Tianyu Pang , Chao Du , Wee Sun Lee , Min Lin

Low-Precision Floating-Point for Efficient On-Board Deep Neural Network Processing

One of the major bottlenecks in high-resolution Earth Observation (EO) space systems is the downlink between the satellite and the ground. Due to hardware limitations, on-board power limitations or ground-station operation costs, there is a…

Machine Learning · Computer Science 2023-11-21 Cédric Gernigon , Silviu-Ioan Filip , Olivier Sentieys , Clément Coggiola , Mickaël Bruno

Collage: Light-Weight Low-Precision Strategy for LLM Training

Large models training is plagued by the intense compute cost and limited hardware memory. A practical solution is low-precision representation but is troubled by loss in numerical accuracy and unstable training rendering the model less…

Machine Learning · Computer Science 2024-05-07 Tao Yu , Gaurav Gupta , Karthick Gopalswamy , Amith Mamidala , Hao Zhou , Jeffrey Huynh , Youngsuk Park , Ron Diamant , Anoop Deoras , Luke Huan

Training with Mixed-Precision Floating-Point Assignments

When training deep neural networks, keeping all tensors in high precision (e.g., 32-bit or even 16-bit floats) is often wasteful. However, keeping all tensors in low precision (e.g., 8-bit floats) can lead to unacceptable accuracy loss.…

Machine Learning · Computer Science 2023-06-26 Wonyeol Lee , Rahul Sharma , Alex Aiken