Related papers: BB-ML: Basic Block Performance Prediction using Ma…

Prediction of GPU Failures Under Deep Learning Workloads

Graphics processing units (GPUs) are the de facto standard for processing deep learning (DL) tasks. Meanwhile, GPU failures, which are inevitable, cause severe consequences in DL tasks: they disrupt distributed trainings, crash inference…

Machine Learning · Computer Science 2022-01-31 Heting Liu , Zhichao Li , Cheng Tan , Rongqiu Yang , Guohong Cao , Zherui Liu , Chuanxiong Guo

Towards Universal Performance Modeling for Machine Learning Training on Multi-GPU Platforms

Characterizing and predicting the training performance of modern machine learning (ML) workloads on compute systems with compute and communication spread between CPUs, GPUs, and network devices is not only the key to optimization and…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-11-27 Zhongyi Lin , Ning Sun , Pallab Bhattacharya , Xizhou Feng , Louis Feng , John D. Owens

Machine Learning for Performance Prediction of Spark Cloud Applications

Big data applications and analytics are employed in many sectors for a variety of goals: improving customers satisfaction, predicting market behavior or improving processes in public health. These applications consist of complex software…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-08-30 Alexandre Maros , Fabricio Murai , Ana Paula Couto da Silva , Jussara M. Almeida , Marco Lattuada , Eugenio Gianniti , Marjan Hosseini , Danilo Ardagna

Machine learning in business process management: A systematic literature review

Machine learning (ML) provides algorithms to create computer programs based on data without explicitly programming them. In business process management (BPM), ML applications are used to analyse and improve processes efficiently. Three…

Machine Learning · Computer Science 2024-05-28 Sven Weinzierl , Sandra Zilker , Sebastian Dunzer , Martin Matzner

Efficient Fine-Grained GPU Performance Modeling for Distributed Deep Learning of LLM

Training Large Language Models(LLMs) is one of the most compute-intensive tasks in high-performance computing. Predicting end-to-end training time for multi-billion parameter models distributed across hundreds of GPUs remains challenging…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-30 Biyao Zhang , Mingkai Zheng , Debargha Ganguly , Xuecen Zhang , Vikash Singh , Vipin Chaudhary , Zhao Zhang

Predictive Modeling of I/O Performance for Machine Learning Training Pipelines: A Data-Driven Approach to Storage Optimization

Modern machine learning training is increasingly bottlenecked by data I/O rather than compute. GPUs often sit idle at below 50% utilization waiting for data. This paper presents a machine learning approach to predict I/O performance and…

Performance · Computer Science 2025-12-22 Karthik Prabhakar , Durgamadhab Mishra

Quantum Processing Unit (QPU) processing time Prediction with Machine Learning

This paper explores the application of machine learning (ML) techniques in predicting the QPU processing time of quantum jobs. By leveraging ML algorithms, this study introduces predictive models that are designed to enhance operational…

Quantum Physics · Physics 2025-10-24 Lucy Xing , Sanjay Vishwakarma , David Kremer , Francisco Martin-Fernandez , Ismael Faro , Juan Cruz-Benito

An Accurate and Low-Parameter Machine Learning Architecture for Next Location Prediction

Next location prediction is a discipline that involves predicting a users next location. Its applications include resource allocation, quality of service, energy efficiency, and traffic management. This paper proposes an energy-efficient,…

Machine Learning · Computer Science 2024-02-05 Calvin Jary , Nafiseh Kahani

Building a Performance Model for Deep Learning Recommendation Model Training on GPUs

We devise a performance model for GPU training of Deep Learning Recommendation Models (DLRM), whose GPU utilization is low compared to other well-optimized CV and NLP models. We show that both the device active time (the sum of kernel…

Machine Learning · Computer Science 2022-11-18 Zhongyi Lin , Louis Feng , Ehsan K. Ardestani , Jaewon Lee , John Lundell , Changkyu Kim , Arun Kejariwal , John D. Owens

Block size estimation for data partitioning in HPC applications using machine learning techniques

The extensive use of HPC infrastructures and frameworks for running dataintensive applications has led to a growing interest in data partitioning techniques and strategies. In fact, application performance can be heavily affected by how…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-02 Riccardo Cantini , Fabrizio Marozzo , Alessio Orsino , Domenico Talia , Paolo Trunfio , Rosa M. Badia , Jorge Ejarque , Fernando Vazquez

BSC: Block-based Stochastic Computing to Enable Accurate and Efficient TinyML

Along with the progress of AI democratization, machine learning (ML) has been successfully applied to edge applications, such as smart phones and automated driving. Nowadays, more applications require ML on tiny devices with extremely…

Machine Learning · Computer Science 2021-11-15 Yuhong Song , Edwin Hsing-Mean Sha , Qingfeng Zhuge , Rui Xu , Yongzhuo Zhang , Bingzhe Li , Lei Yang

Insights into Performance Fitness and Error Metrics for Machine Learning

Machine learning (ML) is the field of training machines to achieve high level of cognition and perform human-like analysis. Since ML is a data-driven approach, it seemingly fits into our daily lives and operations as well as complex and…

Machine Learning · Computer Science 2021-11-25 M. Z. Naser , Amir Alavi

High Per Parameter: A Large-Scale Study of Hyperparameter Tuning for Machine Learning Algorithms

Hyperparameters in machine learning (ML) have received a fair amount of attention, and hyperparameter tuning has come to be regarded as an important step in the ML pipeline. But just how useful is said tuning? While smaller-scale…

Machine Learning · Computer Science 2022-09-05 Moshe Sipper

A Few GPUs, A Whole Lotta Scale: Faithful LLM Training Emulation with PrismLLM

Large language model (LLM) training today runs on clusters spanning thousands of GPUs. While this scale enables rapid model advances, developing, debugging, and performance-tuning the training framework inevitably becomes complex and…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-18 Shaoke Xi , ChonLam Lao , Boyi Jia , Jiaqi Gao , Zhipeng Zhang , Jiamin Cao , Brian Sutioso , Erci Xu , Minlan Yu , Kui Ren , Yong Li , Zhengping Qian , Ennan Zhai , Jingren Zhou

How predictable is language model benchmark performance?

We investigate large language model performance across five orders of magnitude of compute scaling in eleven recent model architectures. We show that average benchmark performance, aggregating over many individual tasks and evaluations as…

Machine Learning · Computer Science 2024-01-11 David Owen

LLMPerf: GPU Performance Modeling meets Large Language Models

Performance modeling, a pivotal domain in program cost analysis, currently relies on manually crafted models constrained by various program and hardware limitations, especially in the intricate landscape of GPGPU. Meanwhile, Large Language…

Performance · Computer Science 2025-03-17 Khoi N. M. Nguyen , Hoang Duy Nguyen Do , Huyen Thao Le , Thanh Tuan Dao

Boolean Matrix Logic Programming on the GPU

Traditional logic programming relies on symbolic computation on the CPU, which can limit performance for large-scale inference tasks. Recent advances in GPU hardware enable high-throughput matrix operations, motivating a shift toward…

Symbolic Computation · Computer Science 2025-08-20 Lun Ai

Performance Modeling and Prediction for Dense Linear Algebra

This dissertation introduces measurement-based performance modeling and prediction techniques for dense linear algebra algorithms. As a core principle, these techniques avoid executions of such algorithms entirely, and instead predict their…

Performance · Computer Science 2017-06-06 Elmar Peise

GPU Memory Prediction for Multimodal Model Training

As deep learning models in agentic AI systems grow in scale and complexity, GPU memory requirements increase and often exceed the available GPU memory capacity, so that out-of-memory (OoM) errors occur. It is well known that OoM interrupts…

Machine Learning · Computer Science 2025-12-10 Jinwoo Jeong , Minchul Kang , Younghun Go , Changyong Shin , Hyunho Lee , Junho Yoon , Gyeongsik Yang , Chuck Yoo

An Open-Source ML-Based Full-Stack Optimization Framework for Machine Learning Accelerators

Parameterizable machine learning (ML) accelerators are the product of recent breakthroughs in ML. To fully enable their design space exploration (DSE), we propose a physical-design-driven, learning-based prediction framework for…

Machine Learning · Computer Science 2023-08-24 Hadi Esmaeilzadeh , Soroush Ghodrati , Andrew B. Kahng , Joon Kyung Kim , Sean Kinzer , Sayak Kundu , Rohan Mahapatra , Susmita Dey Manasi , Sachin Sapatnekar , Zhiang Wang , Ziqing Zeng