Related papers: GPU-Accelerated Primal Learning for Extremely Fast…

Optimizing Data Collection in Deep Reinforcement Learning

Reinforcement learning (RL) workloads take a notoriously long time to train due to the large number of samples collected at run-time from simulators. Unfortunately, cluster scale-up approaches remain expensive, and commonly used CPU…

Machine Learning · Computer Science 2022-07-19 James Gleeson , Daniel Snider , Yvonne Yang , Moshe Gabel , Eyal de Lara , Gennady Pekhimenko

Efficient Use of Limited-Memory Accelerators for Linear Learning on Heterogeneous Systems

We propose a generic algorithmic building block to accelerate training of machine learning models on heterogeneous compute systems. Our scheme allows to efficiently employ compute accelerators such as GPUs and FPGAs for the training of…

Machine Learning · Computer Science 2017-11-08 Celestine Dünner , Thomas Parnell , Martin Jaggi

Speedup deep learning models on GPU by taking advantage of efficient unstructured pruning and bit-width reduction

This work is focused on the pruning of some convolutional neural networks (CNNs) and improving theirs efficiency on graphic processing units (GPU) by using a direct sparse algorithm. The Nvidia deep neural network (cuDnn) library is the…

Machine Learning · Computer Science 2022-08-30 Marcin Pietroń , Dominik Żurek

Leveraging GPU batching for scalable nonlinear programming through massive Lagrangian decomposition

We present the implementation of a trust-region Newton algorithm ExaTron for bound-constrained nonlinear programming problems, fully running on multiple GPUs. Without data transfers between CPU and GPU, our implementation has achieved the…

Optimization and Control · Mathematics 2021-06-30 Youngdae Kim , François Pacaud , Kibaek Kim , Mihai Anitescu

Deep Learning Models on CPUs: A Methodology for Efficient Training

GPUs have been favored for training deep learning models due to their highly parallelized architecture. As a result, most studies on training optimization focus on GPUs. There is often a trade-off, however, between cost and efficiency when…

Machine Learning · Computer Science 2023-06-21 Quchen Fu , Ramesh Chukka , Keith Achorn , Thomas Atta-fosu , Deepak R. Canchi , Zhongwei Teng , Jules White , Douglas C. Schmidt

Benchmarking GPU and TPU Performance with Graph Neural Networks

Many artificial intelligence (AI) devices have been developed to accelerate the training and inference of neural networks models. The most common ones are the Graphics Processing Unit (GPU) and Tensor Processing Unit (TPU). They are highly…

Machine Learning · Computer Science 2022-10-25 xiangyang Ju , Yunsong Wang , Daniel Murnane , Nicholas Choma , Steven Farrell , Paolo Calafiura

Large-Scale Stochastic Learning using GPUs

In this work we propose an accelerated stochastic learning system for very large-scale applications. Acceleration is achieved by mapping the training algorithm onto massively parallel processors: we demonstrate a parallel, asynchronous GPU…

Machine Learning · Computer Science 2017-02-24 Thomas Parnell , Celestine Dünner , Kubilay Atasu , Manolis Sifalakis , Haris Pozidis

DOGE-Train: Discrete Optimization on GPU with End-to-end Training

We present a fast, scalable, data-driven approach for solving relaxations of 0-1 integer linear programs. We use a combination of graph neural networks (GNN) and the Lagrange decomposition based algorithm FastDOG (Abbas and Swoboda 2022b).…

Machine Learning · Computer Science 2024-01-01 Ahmed Abbas , Paul Swoboda

When deep learning models on GPU can be accelerated by taking advantage of unstructured sparsity

This paper is focused on the improvement the efficiency of the sparse convolutional neural networks (CNNs) layers on graphic processing units (GPU). The Nvidia deep neural network (cuDnn) library provides the most effective implementation…

Machine Learning · Computer Science 2022-01-03 Marcin Pietroń , Dominik Żurek

Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs

Recent advances in transformer-based foundation models have made them the default choice for many tasks, but their rapidly growing size makes fitting a full model on a single GPU increasingly difficult and their computational cost…

Machine Learning · Computer Science 2026-01-21 Pierre Abillama , Changwoo Lee , Juechu Dong , David Blaauw , Dennis Sylvester , Hun-Seok Kim

Scalable training of graph convolutional neural networks for fast and accurate predictions of HOMO-LUMO gap in molecules

Graph Convolutional Neural Network (GCNN) is a popular class of deep learning (DL) models in material science to predict material properties from the graph representation of molecular structures. Training an accurate and comprehensive GCNN…

Machine Learning · Computer Science 2022-07-26 Jong Youl Choi , Pei Zhang , Kshitij Mehta , Andrew Blanchard , Massimiliano Lupo Pasini

L2PF -- Learning to Prune Faster

Various applications in the field of autonomous driving are based on convolutional neural networks (CNNs), especially for processing camera data. The optimization of such CNNs is a major challenge in continuous development. Newly learned…

Computer Vision and Pattern Recognition · Computer Science 2021-01-08 Manoj-Rohit Vemparala , Nael Fasfous , Alexander Frickenstein , Mhd Ali Moraly , Aquib Jamal , Lukas Frickenstein , Christian Unger , Naveen-Shankar Nagaraja , Walter Stechele

Accelerating DNN Training with Structured Data Gradient Pruning

Weight pruning is a technique to make Deep Neural Network (DNN) inference more computationally efficient by reducing the number of model parameters over the course of training. However, most weight pruning techniques generally does not…

Machine Learning · Computer Science 2022-02-03 Bradley McDanel , Helia Dinh , John Magallanes

Graph Neural Network Training with Data Tiering

Graph Neural Networks (GNNs) have shown success in learning from graph-structured data, with applications to fraud detection, recommendation, and knowledge graph reasoning. However, training GNN efficiently is challenging because: 1) GPU…

Machine Learning · Computer Science 2021-11-12 Seung Won Min , Kun Wu , Mert Hidayetoğlu , Jinjun Xiong , Xiang Song , Wen-mei Hwu

GPU Asynchronous Stochastic Gradient Descent to Speed Up Neural Network Training

The ability to train large-scale neural networks has resulted in state-of-the-art performance in many areas of computer vision. These results have largely come from computational break throughs of two forms: model parallelism, e.g. GPU…

Computer Vision and Pattern Recognition · Computer Science 2013-12-24 Thomas Paine , Hailin Jin , Jianchao Yang , Zhe Lin , Thomas Huang

Accelerating a Linear Programming Algorithm on AMD GPUs

Linear Programming (LP) is a foundational optimization technique with widespread applications in finance, energy trading, and supply chain logistics. However, traditional Central Processing Unit (CPU)-based LP solvers often struggle to meet…

Optimization and Control · Mathematics 2025-08-26 Xiyan Hu , Titus Parker , Connor Phillips , Yifa Yu

MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models

As inference on Large Language Models (LLMs) emerges as an important workload in machine learning applications, weight quantization has become a standard technique for efficient GPU deployment. Quantization not only reduces model size, but…

Machine Learning · Computer Science 2024-08-22 Elias Frantar , Roberto L. Castro , Jiale Chen , Torsten Hoefler , Dan Alistarh

LSM-GNN: Large-scale Storage-based Multi-GPU GNN Training by Optimizing Data Transfer Scheme

Graph Neural Networks (GNNs) are widely used today in recommendation systems, fraud detection, and node/link classification tasks. Real world GNNs continue to scale in size and require a large memory footprint for storing graphs and…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-31 Jeongmin Brian Park , Kun Wu , Vikram Sharma Mailthody , Zaid Quresh , Scott Mahlke , Wen-mei Hwu

Large Scale Artificial Neural Network Training Using Multi-GPUs

This paper describes a method for accelerating large scale Artificial Neural Networks (ANN) training using multi-GPUs by reducing the forward and backward passes to matrix multiplication. We propose an out-of-core multi-GPU matrix…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-16 Linnan Wang , Wei Wu , Jianxiong Xiao , Yang Yi

Accelerating Visual-Policy Learning through Parallel Differentiable Simulation

In this work, we propose a computationally efficient algorithm for visual policy learning that leverages differentiable simulation and first-order analytical policy gradients. Our approach decouple the rendering process from the computation…

Machine Learning · Computer Science 2025-11-12 Haoxiang You , Yilang Liu , Ian Abraham