Related papers: Memory Efficient Mixed-Precision Optimizers

Mixed Precision Training

Deep neural networks have enabled progress in a wide variety of applications. Growing the size of the neural network typically results in improved accuracy. As model sizes grow, the memory and compute requirements for training these models…

Artificial Intelligence · Computer Science 2018-02-19 Paulius Micikevicius , Sharan Narang , Jonah Alben , Gregory Diamos , Erich Elsen , David Garcia , Boris Ginsburg , Michael Houston , Oleksii Kuchaiev , Ganesh Venkatesh , Hao Wu

Floating-Point Multiply-Add with Approximate Normalization for Low-Cost Matrix Engines

The widespread adoption of machine learning algorithms necessitates hardware acceleration to ensure efficient performance. This acceleration relies on custom matrix engines that operate on full or reduced-precision floating-point…

Hardware Architecture · Computer Science 2024-08-23 Kosmas Alexandridis , Christodoulos Peltekis , Dionysios Filippas , Giorgos Dimitrakopoulos

FlashOptim: Optimizers for Memory-Efficient Training

Standard mixed-precision training of neural networks requires many bytes of accelerator memory for each model parameter. These bytes reflect not just the parameter itself, but also its gradient and one or more optimizer state variables.…

Machine Learning · Computer Science 2026-03-13 Jose Javier Gonzalez Ortiz , Abhay Gupta , Christopher Rinard , Davis Blalock

Combining Learning and Optimization for Transprecision Computing

The growing demands of the worldwide IT infrastructure stress the need for reduced power consumption, which is addressed in so-called transprecision computing by improving energy efficiency at the expense of precision. For example, reducing…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-07-30 Andrea Borghesi , Giuseppe Tagliavini , Michele Lombardi , Luca Benini , Michela Milano

Precision-Aware Iterative Algorithms Based on Group-Shared Exponents of Floating-Point Numbers

Iterative solvers are frequently used in scientific applications and engineering computations. However, the memory-bound Sparse Matrix-Vector (SpMV) kernel computation hinders the efficiency of iterative algorithms. As modern hardware…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-11-08 Jianhua Gao , Jiayuan Shen , Yuxiang Zhang , Weixing Ji , Hua Huang

Sound Mixed-Precision Optimization with Rewriting

Finite-precision arithmetic computations face an inherent tradeoff between accuracy and efficiency. The points in this tradeoff space are determined, among other factors, by different data types but also evaluation orders. To put it simply,…

Programming Languages · Computer Science 2017-07-10 Eva Darulova , Einar Horn , Saksham Sharma

Training with Mixed-Precision Floating-Point Assignments

When training deep neural networks, keeping all tensors in high precision (e.g., 32-bit or even 16-bit floats) is often wasteful. However, keeping all tensors in low precision (e.g., 8-bit floats) can lead to unacceptable accuracy loss.…

Machine Learning · Computer Science 2023-06-26 Wonyeol Lee , Rahul Sharma , Alex Aiken

Collage: Light-Weight Low-Precision Strategy for LLM Training

Large models training is plagued by the intense compute cost and limited hardware memory. A practical solution is low-precision representation but is troubled by loss in numerical accuracy and unstable training rendering the model less…

Machine Learning · Computer Science 2024-05-07 Tao Yu , Gaurav Gupta , Karthick Gopalswamy , Amith Mamidala , Hao Zhou , Jeffrey Huynh , Youngsuk Park , Ron Diamant , Anoop Deoras , Luke Huan

Revisiting 16-bit Neural Network Training: A Practical Approach for Resource-Limited Learning

With the increasing complexity of machine learning models, managing computational resources like memory and processing power has become a critical concern. Mixed precision techniques, which leverage different numerical precisions during…

Machine Learning · Computer Science 2026-04-20 Juyoung Yun , Sol Choi , Francois Rameau , Byungkon Kang , Zhoulai Fu

Reduced and mixed precision turbulent flow simulations using explicit finite difference schemes

The use of reduced and mixed precision computing has gained increasing attention in high-performance computing (HPC) as a means to improve computational efficiency, particularly on modern hardware architectures like GPUs. In this work, we…

Computational Engineering, Finance, and Science · Computer Science 2025-05-28 Bálint Siklósi , Pushpender K. Sharma , David J. Lusher , István Z. Reguly , Neil D. Sandham

Low-Precision Floating-Point Schemes for Neural Network Training

The use of low-precision fixed-point arithmetic along with stochastic rounding has been proposed as a promising alternative to the commonly used 32-bit floating point arithmetic to enhance training neural networks training in terms of…

Machine Learning · Computer Science 2018-04-17 Marc Ortiz , Adrián Cristal , Eduard Ayguadé , Marc Casas

Mixed Precision Training With 8-bit Floating Point

Reduced precision computation for deep neural networks is one of the key areas addressing the widening compute gap driven by an exponential growth in model size. In recent years, deep learning training has largely migrated to 16-bit…

Machine Learning · Computer Science 2019-05-30 Naveen Mellempudi , Sudarshan Srinivasan , Dipankar Das , Bharat Kaul

Half precision wave simulation

In recent years, half precision floating-point arithmetic has gained wide support in hardware and software stack thanks to the advance of artificial intelligence and machine learning applications. Operating at half precision can…

Numerical Analysis · Mathematics 2024-09-19 Longfei Gao , Kevin Harms

Speeding up and reducing memory usage for scientific machine learning via mixed precision

Scientific machine learning (SciML) has emerged as a versatile approach to address complex computational science and engineering problems. Within this field, physics-informed neural networks (PINNs) and deep operator networks (DeepONets)…

Machine Learning · Computer Science 2024-01-31 Joel Hayford , Jacob Goldman-Wetzler , Eric Wang , Lu Lu

Scaling the memory wall using mixed-precision -- HPG-MxP on an exascale machine

Mixed-precision algorithms have been proposed as a way for scientific computing to benefit from some of the gains seen for artificial intelligence (AI) on recent high performance computing (HPC) platforms. A few applications dominated by…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-16 Aditya Kashi , Nicholson Koukpaizan , Hao Lu , Michael Matheson , Sarp Oral , Feiyi Wang

Constrained Precision Tuning

Precision tuning or customized precision number representations is emerging, in these recent years, as one of the most promising techniques that has a positive impact on the footprint of programs concerning energy consumption, bandwidth…

Software Engineering · Computer Science 2022-03-16 Dorra Ben Khalifa , Matthieu Martel

Gradient Methods with Memory for Minimizing Composite Functions

The recently introduced Gradient Methods with Memory use a subset of the past oracle information to create an accurate model of the objective function that enables them to surpass the Gradient Method in practical performance. The model…

Optimization and Control · Mathematics 2024-01-30 Mihai I. Florea

Mixed precision matrix interpolative decompositions for model reduction

Renewed interest in mixed-precision algorithms has emerged due to growing data capacity and bandwidth concerns, as well as the advancement of GPUs, which enable significant speedup for low precision arithmetic. In light of this, we propose…

Numerical Analysis · Mathematics 2020-12-14 Alec Michael Dunton , Alyson Fox

A Practical Mixed Precision Algorithm for Post-Training Quantization

Neural network quantization is frequently used to optimize model size, latency and power consumption for on-device deployment of neural networks. In many cases, a target bit-width is set for an entire network, meaning every layer get…

Machine Learning · Computer Science 2023-02-13 Nilesh Prasad Pandey , Markus Nagel , Mart van Baalen , Yin Huang , Chirag Patel , Tijmen Blankevoort

Multistage Mixed Precision Iterative Refinement

Low precision arithmetic, in particular half precision floating point arithmetic, is now available in commercial hardware. Using lower precision can offer significant savings in computation and communication costs with proportional savings…

Numerical Analysis · Mathematics 2021-11-16 Eda Oktay , Erin Carson