Related papers: High-Accuracy Low-Precision Training

SWALP : Stochastic Weight Averaging in Low-Precision Training

Low precision operations can provide scalability, memory savings, portability, and energy efficiency. This paper proposes SWALP, an approach to low precision training that averages low-precision SGD iterates with a modified learning rate…

Machine Learning · Computer Science 2019-05-21 Guandao Yang , Tianyi Zhang , Polina Kirichenko , Junwen Bai , Andrew Gordon Wilson , Christopher De Sa

Low-Precision Arithmetic for Fast Gaussian Processes

Low-precision arithmetic has had a transformative effect on the training of neural networks, reducing computation, memory and energy requirements. However, despite its promise, low-precision arithmetic has received little attention for…

Machine Learning · Computer Science 2022-07-15 Wesley J. Maddox , Andres Potapczynski , Andrew Gordon Wilson

A method for accelerating low precision operations by sparse matrix multiplication

In recent years, the fervent demand for computational power across various domains has prompted hardware manufacturers to introduce specialized computing hardware aimed at enhancing computational capabilities. Particularly, the utilization…

Numerical Analysis · Mathematics 2024-03-12 Hongyaoxing Gu

HALP: Hardware-Aware Latency Pruning

Structural pruning can simplify network architecture and improve inference speed. We propose Hardware-Aware Latency Pruning (HALP) that formulates structural pruning as a global resource allocation optimization problem, aiming at maximizing…

Computer Vision and Pattern Recognition · Computer Science 2021-10-22 Maying Shen , Hongxu Yin , Pavlo Molchanov , Lei Mao , Jianna Liu , Jose M. Alvarez

Why Does Stochastic Gradient Descent Slow Down in Low-Precision Training?

Low-precision training has become crucial for reducing the computational and memory costs of large-scale deep learning. However, quantizing gradients introduces magnitude shrinkage, which can change how stochastic gradient descent (SGD)…

Machine Learning · Computer Science 2026-01-09 Vincent-Daniel Yun

Low-Precision Stochastic Gradient Langevin Dynamics

While low-precision optimization has been widely used to accelerate deep learning, low-precision sampling remains largely unexplored. As a consequence, sampling is simply infeasible in many large-scale scenarios, despite providing…

Machine Learning · Computer Science 2022-06-22 Ruqi Zhang , Andrew Gordon Wilson , Christopher De Sa

Balance is Essence: Accelerating Sparse Training via Adaptive Gradient Correction

Despite impressive performance, deep neural networks require significant memory and computation costs, prohibiting their application in resource-constrained scenarios. Sparse training is one of the most common techniques to reduce these…

Machine Learning · Computer Science 2023-12-06 Bowen Lei , Dongkuan Xu , Ruqi Zhang , Shuren He , Bani K. Mallick

Optimization of GNN Training Through Half-precision

Recent trends in lower precision, e.g. half-precision floating point, training have shown improved system performance and reduced memory usage for Deep Learning while maintaining accuracy. However, current GNN systems cannot achieve such…

Machine Learning · Computer Science 2025-09-17 Arnab Kanti Tarafder , Yidong Gong , Pradeep Kumar

Hardware-Efficient Mixed-Precision CP Tensor Decomposition

Tensor decomposition has been widely used in machine learning and high-volume data analysis. However, large-scale tensor factorization often consumes huge memory and computing cost. Meanwhile, modernized computing hardware such as tensor…

Optimization and Control · Mathematics 2022-09-12 Zi Yang , Junnan Shan , Zheng Zhang

HALO: Hadamard-Assisted Lower-Precision Optimization for LLMs

Quantized training of Large Language Models (LLMs) remains an open challenge, as maintaining accuracy while performing all matrix multiplications in low precision has proven difficult. This is particularly the case when fine-tuning…

Machine Learning · Computer Science 2025-11-06 Saleh Ashkboos , Mahdi Nikdan , Soroush Tabesh , Roberto L. Castro , Torsten Hoefler , Dan Alistarh

Efficient Relaxed Gradient Support Pursuit for Sparsity Constrained Non-convex Optimization

Large-scale non-convex sparsity-constrained problems have recently gained extensive attention. Most existing deterministic optimization methods (e.g., GraSP) are not suitable for large-scale and high-dimensional problems, and thus…

Machine Learning · Computer Science 2019-12-03 Fanhua Shang , Bingkun Wei , Hongying Liu , Yuanyuan Liu , Jiacheng Zhuo

Differentiable Self-Adaptive Learning Rate

Learning rate adaptation is a popular topic in machine learning. Gradient Descent trains neural nerwork with a fixed learning rate. Learning rate adaptation is proposed to accelerate the training process through adjusting the step size in…

Machine Learning · Computer Science 2022-10-20 Bozhou Chen , Hongzhi Wang , Chenmin Ba

Variance Reduction Methods Do Not Need to Compute Full Gradients: Improved Efficiency through Shuffling

Stochastic optimization algorithms are widely used for machine learning with large-scale data. However, their convergence often suffers from non-vanishing variance. Variance Reduction (VR) methods, such as SVRG and SARAH, address this issue…

Machine Learning · Computer Science 2026-01-12 Daniil Medyakov , Gleb Molodtsov , Savelii Chezhegov , Alexey Rebrikov , Aleksandr Beznosikov

Masked Training of Neural Networks with Partial Gradients

State-of-the-art training algorithms for deep learning models are based on stochastic gradient descent (SGD). Recently, many variations have been explored: perturbing parameters for better accuracy (such as in Extragradient), limiting SGD…

Machine Learning · Computer Science 2022-03-23 Amirkeivan Mohtashami , Martin Jaggi , Sebastian U. Stich

Training with reduced precision of a support vector machine model for text classification

This paper presents the impact of using quantization on the efficiency of multi-class text classification in the training process of a support vector machine (SVM). This work is focused on comparing the efficiency of SVM model trained using…

Machine Learning · Computer Science 2020-07-20 Dominik Żurek , Marcin Pietroń

Towards Efficient Training for Neural Network Quantization

Quantization reduces computation costs of neural networks but suffers from performance degeneration. Is this accuracy drop due to the reduced capacity, or inefficient training during the quantization procedure? After looking into the…

Computer Vision and Pattern Recognition · Computer Science 2019-12-24 Qing Jin , Linjie Yang , Zhenyu Liao

Structural Pruning via Latency-Saliency Knapsack

Structural pruning can simplify network architecture and improve inference speed. We propose Hardware-Aware Latency Pruning (HALP) that formulates structural pruning as a global resource allocation optimization problem, aiming at maximizing…

Computer Vision and Pattern Recognition · Computer Science 2022-10-20 Maying Shen , Hongxu Yin , Pavlo Molchanov , Lei Mao , Jianna Liu , Jose M. Alvarez

Adaptive Low-Precision Training for Embeddings in Click-Through Rate Prediction

Embedding tables are usually huge in click-through rate (CTR) prediction models. To train and deploy the CTR models efficiently and economically, it is necessary to compress their embedding tables at the training stage. To this end, we…

Machine Learning · Computer Science 2024-08-07 Shiwei Li , Huifeng Guo , Lu Hou , Wei Zhang , Xing Tang , Ruiming Tang , Rui Zhang , Ruixuan Li

Training with Fewer Bits: Unlocking Edge LLMs Training with Stochastic Rounding

LLM training is resource-intensive. Quantized training improves computational and memory efficiency but introduces quantization noise, which can hinder convergence and degrade model accuracy. Stochastic Rounding (SR) has emerged as a…

Machine Learning · Computer Science 2025-11-04 Taowen Liu , Marta Andronic , Deniz Gündüz , George A. Constantinides

Design and Prototyping Distributed CNN Inference Acceleration in Edge Computing

For time-critical IoT applications using deep learning, inference acceleration through distributed computing is a promising approach to meet a stringent deadline. In this paper, we implement a working prototype of a new distributed…

Computer Vision and Pattern Recognition · Computer Science 2022-11-29 Zhongtian Dong , Nan Li , Alexandros Iosifidis , Qi Zhang