Related papers: A Runtime-Based Computational Performance Predicto…

PM2Lat: Highly Accurate and Generalized Prediction of DNN Execution Latency on GPUs

We present PM2Lat, a fast and generalized framework for accurately predicting the latency of deep neural network models on GPUs, with special focus on NVIDIA. Unlike prior methods that rely on deep learning models or handcrafted heuristics,…

Performance · Computer Science 2026-03-03 Truong-Thanh Le , Hoang-Loc La , Amir Taherkordi , Frank Eliassen , Phuong Hoai Ha and , Peiyuan Guan

Toward Accurate Platform-Aware Performance Modeling for Deep Neural Networks

In this paper, we provide a fine-grain machine learning-based method, PerfNetV2, which improves the accuracy of our previous work for modeling the neural network performance on a variety of GPU accelerators. Given an application, the…

Machine Learning · Computer Science 2020-12-02 Chuan-Chi Wang , Ying-Chiao Liao , Ming-Chang Kao , Wen-Yew Liang , Shih-Hao Hung

Deep Learning Models on CPUs: A Methodology for Efficient Training

GPUs have been favored for training deep learning models due to their highly parallelized architecture. As a result, most studies on training optimization focus on GPUs. There is often a trade-off, however, between cost and efficiency when…

Machine Learning · Computer Science 2023-06-21 Quchen Fu , Ramesh Chukka , Keith Achorn , Thomas Atta-fosu , Deepak R. Canchi , Zhongwei Teng , Jules White , Douglas C. Schmidt

Automated Runtime-Aware Scheduling for Multi-Tenant DNN Inference on GPU

With the fast development of deep neural networks (DNNs), many real-world applications are adopting multiple models to conduct compound tasks, such as co-running classification, detection, and segmentation models on autonomous vehicles.…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-11-30 Fuxun Yu , Shawn Bray , Di Wang , Longfei Shangguan , Xulong Tang , Chenchen Liu , Xiang Chen

A Metaprogramming and Autotuning Framework for Deploying Deep Learning Applications

In recent years, deep neural networks (DNNs), have yielded strong results on a wide range of applications. Graphics Processing Units (GPUs) have been one key enabling factor leading to the current popularity of DNNs. However, despite…

Neural and Evolutionary Computing · Computer Science 2016-11-22 Matthew W. Moskewicz , Ali Jannesari , Kurt Keutzer

Performance Prediction for Convolutional Neural Networks in Edge Devices

Running Convolutional Neural Network (CNN) based applications on edge devices near the source of data can meet the latency and privacy challenges. However due to their reduced computing resources and their energy constraints, these edge…

Computer Vision and Pattern Recognition · Computer Science 2020-10-23 Halima Bouzidi , Hamza Ouarnoughi , Smail Niar , Abdessamad Ait El Cadi

GPU Activity Prediction using Representation Learning

GPU activity prediction is an important and complex problem. This is due to the high level of contention among thousands of parallel threads. This problem was mostly addressed using heuristics. We propose a representation learning approach…

Machine Learning · Computer Science 2017-03-28 Aswin Raghavan , Mohamed Amer , Timothy Shields , David Zhang , Sek Chai

Analyzing Machine Learning Workloads Using a Detailed GPU Simulator

Most deep neural networks deployed today are trained using GPUs via high-level frameworks such as TensorFlow and PyTorch. This paper describes changes we made to the GPGPU-Sim simulator to enable it to run PyTorch by running PTX kernels…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-01-29 Jonathan Lew , Deval Shah , Suchita Pati , Shaylin Cattell , Mengchi Zhang , Amruth Sandhupatla , Christopher Ng , Negar Goli , Matthew D. Sinclair , Timothy G. Rogers , Tor Aamodt

Forecasting GPU Performance for Deep Learning Training and Inference

Deep learning kernels exhibit predictable memory accesses and compute patterns, making GPUs' parallel architecture well-suited for their execution. Software and runtime systems for GPUs are optimized to better utilize the stream…

Machine Learning · Computer Science 2024-12-13 Seonho Lee , Amar Phanishayee , Divya Mahajan

Understanding Training Efficiency of Deep Learning Recommendation Models at Scale

The use of GPUs has proliferated for machine learning workflows and is now considered mainstream for many deep learning models. Meanwhile, when training state-of-the-art personal recommendation models, which consume the highest number of…

Hardware Architecture · Computer Science 2020-11-12 Bilge Acun , Matthew Murphy , Xiaodong Wang , Jade Nie , Carole-Jean Wu , Kim Hazelwood

ResPerfNet: Deep Residual Learning for Regressional Performance Modeling of Deep Neural Networks

The rapid advancements of computing technology facilitate the development of diverse deep learning applications. Unfortunately, the efficiency of parallel computing infrastructures varies widely with neural network models, which hinders the…

Machine Learning · Computer Science 2020-12-04 Chuan-Chi Wang , Ying-Chiao Liao , Chia-Heng Tu , Ming-Chang Kao , Wen-Yew Liang , Shih-Hao Hung

Impact of GPU uncertainty on the training of predictive deep neural networks

[retracted] We found out that the difference was dependent on the Chainer library, and does not replicate with another library (pytorch) which indicates that the results are probably due to a bug in Chainer, rather than being…

Machine Learning · Computer Science 2021-10-07 Maciej Pietrowski , Andrzej Gajda , Takuto Yamamoto , Taisuke Kobayashi , Lana Sinapayen , Eiji Watanabe

Predicting the Computational Cost of Deep Learning Models

Deep learning is rapidly becoming a go-to tool for many artificial intelligence problems due to its ability to outperform other approaches and even humans at many problems. Despite its popularity we are still unable to accurately predict…

Machine Learning · Computer Science 2018-11-30 Daniel Justus , John Brennan , Stephen Bonner , Andrew Stephen McGough

Enhancing Deep Neural Network Training Efficiency and Performance through Linear Prediction

Deep neural networks (DNN) have achieved remarkable success in various fields, including computer vision and natural language processing. However, training an effective DNN model still poses challenges. This paper aims to propose a method…

Machine Learning · Computer Science 2024-07-03 Hejie Ying , Mengmeng Song , Yaohong Tang , Shungen Xiao , Zimin Xiao

Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs

Deep learning frameworks have been widely deployed on GPU servers for deep learning applications in both academia and industry. In training deep neural networks (DNNs), there are many standard processes or algorithms, such as convolution…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-08-21 Shaohuai Shi , Qiang Wang , Xiaowen Chu

Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform

The training process of Deep Neural Network (DNN) is compute-intensive, often taking days to weeks to train a DNN model. Therefore, parallel execution of DNN training on GPUs is a widely adopted approach to speed up the process nowadays.…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-29 Chi-Chung Chen , Chia-Lin Yang , Hsiang-Yun Cheng

DPU or GPU for Accelerating Neural Networks Inference -- Why not both? Split CNN Inference

Video and image streaming on edge devices requires low latency. To address this, Neural Networks (NNs) are widely used, and prior work mainly focuses on accelerating them with single hardware units such as Graphics Processing Units (GPUs),…

Hardware Architecture · Computer Science 2026-05-04 Ali Emre Oztas , Mahir Demir , James Garside , Mikel Luj'an

Benchmarking the Performance and Energy Efficiency of AI Accelerators for AI Training

Deep learning has become widely used in complex AI applications. Yet, training a deep neural network (DNNs) model requires a considerable amount of calculations, long running time, and much energy. Nowadays, many-core AI accelerators (e.g.,…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-12 Yuxin Wang , Qiang Wang , Shaohuai Shi , Xin He , Zhenheng Tang , Kaiyong Zhao , Xiaowen Chu

Estudio de la eficiencia en la escalabilidad de GPUs para el entrenamiento de Inteligencia Artificial

Training large-scale deep learning models has become a key challenge for the scientific community and industry. While the massive use of GPUs can significantly speed up training times, this approach has a negative impact on efficiency. In…

Machine Learning · Computer Science 2025-09-04 David Cortes , Carlos Juiz , Belen Bermejo

Scalable training of graph convolutional neural networks for fast and accurate predictions of HOMO-LUMO gap in molecules

Graph Convolutional Neural Network (GCNN) is a popular class of deep learning (DL) models in material science to predict material properties from the graph representation of molecular structures. Training an accurate and comprehensive GCNN…

Machine Learning · Computer Science 2022-07-26 Jong Youl Choi , Pei Zhang , Kshitij Mehta , Andrew Blanchard , Massimiliano Lupo Pasini