English
Related papers

Related papers: Distributed Low Precision Training Without Mixed P…

200 papers

In this paper, we evaluate training of deep recurrent neural networks with half-precision floats. We implement a distributed, data-parallel, synchronous training algorithm by integrating TensorFlow and CUDA-aware MPI to enable execution…

Machine Learning · Computer Science 2019-12-03 Alexey Svyatkovskiy , Julian Kates-Harbeck , William Tang

Low-precision formats have proven to be an efficient way to reduce not only the memory footprint but also the hardware resources and power consumption of deep learning computations. Under this premise, the posit numerical format appears to…

Machine Learning · Computer Science 2021-05-17 Gonçalo Raposo , Pedro Tomás , Nuno Roma

Deep learning has led to tremendous advancements in the field of Artificial Intelligence. One caveat however is the substantial amount of compute needed to train these deep learning models. Training a benchmark dataset like ImageNet on a…

Machine Learning · Computer Science 2018-10-30 Karanbir Chahal , Manraj Singh Grover , Kuntal Dey

Deep neural networks have enabled progress in a wide variety of applications. Growing the size of the neural network typically results in improved accuracy. As model sizes grow, the memory and compute requirements for training these models…

When training deep neural networks, keeping all tensors in high precision (e.g., 32-bit or even 16-bit floats) is often wasteful. However, keeping all tensors in low precision (e.g., 8-bit floats) can lead to unacceptable accuracy loss.…

Machine Learning · Computer Science 2023-06-26 Wonyeol Lee , Rahul Sharma , Alex Aiken

With the increasing complexity of machine learning models, managing computational resources like memory and processing power has become a critical concern. Mixed precision techniques, which leverage different numerical precisions during…

Machine Learning · Computer Science 2026-04-20 Juyoung Yun , Sol Choi , Francois Rameau , Byungkon Kang , Zhoulai Fu

In this paper, we explore FP8 low-bit data formats for efficient training of large language models (LLMs). Our key insight is that most variables, such as gradients and optimizer states, in LLM training can employ low-precision data formats…

Large-scale convolutional neural networks (CNNs) suffer from very long training times, spanning from hours to weeks, limiting the productivity and experimentation of deep learning practitioners. As networks grow in size and complexity,…

Computer Vision and Pattern Recognition · Computer Science 2020-06-18 Aditya Rajagopal , Diederik Adriaan Vink , Stylianos I. Venieris , Christos-Savvas Bouganis

GPUs have been favored for training deep learning models due to their highly parallelized architecture. As a result, most studies on training optimization focus on GPUs. There is often a trade-off, however, between cost and efficiency when…

Training of large-scale deep neural networks is often constrained by the available computational resources. We study the effect of limited precision data representation and computation on neural network training. Within the context of…

Machine Learning · Computer Science 2015-02-11 Suyog Gupta , Ankur Agrawal , Kailash Gopalakrishnan , Pritish Narayanan

Reduced precision computation for deep neural networks is one of the key areas addressing the widening compute gap driven by an exponential growth in model size. In recent years, deep learning training has largely migrated to 16-bit…

Machine Learning · Computer Science 2019-05-30 Naveen Mellempudi , Sudarshan Srinivasan , Dipankar Das , Bharat Kaul

With the increasing size of Deep Neural Network (DNN) models, the high memory space requirements and computational complexity have become an obstacle for efficient DNN implementations. To ease this problem, using reduced-precision…

Machine Learning · Computer Science 2019-09-10 Jinming Lu , Siyuan Lu , Zhisheng Wang , Chao Fang , Jun Lin , Zhongfeng Wang , Li Du

Distributed training in deep learning (DL) is common practice as data and models grow. The current practice for distributed training of deep neural networks faces the challenges of communication bottlenecks when operating at scale, and…

Machine Learning · Computer Science 2020-12-21 Shubhankar Gahlot , Junqi Yin , Mallikarjun Shankar

Synchronized stochastic gradient descent (SGD) optimizers with data parallelism are widely used in training large-scale deep neural networks. Although using larger mini-batch sizes can improve the system scalability by reducing the…

Large-scale deep neural networks (DNN) have been successfully used in a number of tasks from image recognition to natural language processing. They are trained using large training sets on large models, making them computationally and…

Machine Learning · Computer Science 2017-03-28 Sek Chai , Aswin Raghavan , David Zhang , Mohamed Amer , Tim Shields

State-of-the-art generic low-precision training algorithms use a mix of 16-bit and 32-bit precision, creating the folklore that 16-bit hardware compute units alone are not enough to maximize model accuracy. As a result, deep learning…

Machine Learning · Computer Science 2021-03-09 Pedram Zamirai , Jian Zhang , Christopher R. Aberger , Christopher De Sa

CNNs have been shown to maintain reasonable classification accuracy when quantized to lower precisions. Quantizing to sub 8-bit activations and weights can result in accuracy falling below an acceptable threshold. Techniques exist for…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-02 Philip Colangelo , Nasibeh Nasiri , Asit Mishra , Eriko Nurvitadhi , Martin Margala , Kevin Nealis

Distributed training techniques have been widely deployed in large-scale deep neural networks (DNNs) training on dense-GPU clusters. However, on public cloud clusters, due to the moderate inter-connection bandwidth between instances,…

Large-scale AI model training divides work across thousands of GPUs, then synchronizes gradients across them at each step. This incurs a significant network burden that only centralized, monolithic clusters can support, driving up…

Computer Vision and Pattern Recognition · Computer Science 2025-01-13 David McAllister , Matthew Tancik , Jiaming Song , Angjoo Kanazawa

Is multiplication really necessary for deep neural networks? Here we propose just adding two IEEE754 floating-point numbers with an integer-add instruction in place of a floating-point multiplication instruction. We show that ResNet can be…

Machine Learning · Computer Science 2020-12-08 Tsuguo Mogami
‹ Prev 1 2 3 10 Next ›