English
Related papers

Related papers: BB-ML: Basic Block Performance Prediction using Ma…

200 papers

Graphics processing units (GPUs) are the de facto standard for processing deep learning (DL) tasks. Meanwhile, GPU failures, which are inevitable, cause severe consequences in DL tasks: they disrupt distributed trainings, crash inference…

Machine Learning · Computer Science 2022-01-31 Heting Liu , Zhichao Li , Cheng Tan , Rongqiu Yang , Guohong Cao , Zherui Liu , Chuanxiong Guo

Characterizing and predicting the training performance of modern machine learning (ML) workloads on compute systems with compute and communication spread between CPUs, GPUs, and network devices is not only the key to optimization and…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-11-27 Zhongyi Lin , Ning Sun , Pallab Bhattacharya , Xizhou Feng , Louis Feng , John D. Owens

Big data applications and analytics are employed in many sectors for a variety of goals: improving customers satisfaction, predicting market behavior or improving processes in public health. These applications consist of complex software…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-08-30 Alexandre Maros , Fabricio Murai , Ana Paula Couto da Silva , Jussara M. Almeida , Marco Lattuada , Eugenio Gianniti , Marjan Hosseini , Danilo Ardagna

Machine learning (ML) provides algorithms to create computer programs based on data without explicitly programming them. In business process management (BPM), ML applications are used to analyse and improve processes efficiently. Three…

Machine Learning · Computer Science 2024-05-28 Sven Weinzierl , Sandra Zilker , Sebastian Dunzer , Martin Matzner

Training Large Language Models(LLMs) is one of the most compute-intensive tasks in high-performance computing. Predicting end-to-end training time for multi-billion parameter models distributed across hundreds of GPUs remains challenging…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-30 Biyao Zhang , Mingkai Zheng , Debargha Ganguly , Xuecen Zhang , Vikash Singh , Vipin Chaudhary , Zhao Zhang

Modern machine learning training is increasingly bottlenecked by data I/O rather than compute. GPUs often sit idle at below 50% utilization waiting for data. This paper presents a machine learning approach to predict I/O performance and…

Performance · Computer Science 2025-12-22 Karthik Prabhakar , Durgamadhab Mishra

This paper explores the application of machine learning (ML) techniques in predicting the QPU processing time of quantum jobs. By leveraging ML algorithms, this study introduces predictive models that are designed to enhance operational…

Next location prediction is a discipline that involves predicting a users next location. Its applications include resource allocation, quality of service, energy efficiency, and traffic management. This paper proposes an energy-efficient,…

Machine Learning · Computer Science 2024-02-05 Calvin Jary , Nafiseh Kahani

We devise a performance model for GPU training of Deep Learning Recommendation Models (DLRM), whose GPU utilization is low compared to other well-optimized CV and NLP models. We show that both the device active time (the sum of kernel…

Machine Learning · Computer Science 2022-11-18 Zhongyi Lin , Louis Feng , Ehsan K. Ardestani , Jaewon Lee , John Lundell , Changkyu Kim , Arun Kejariwal , John D. Owens

The extensive use of HPC infrastructures and frameworks for running dataintensive applications has led to a growing interest in data partitioning techniques and strategies. In fact, application performance can be heavily affected by how…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-02 Riccardo Cantini , Fabrizio Marozzo , Alessio Orsino , Domenico Talia , Paolo Trunfio , Rosa M. Badia , Jorge Ejarque , Fernando Vazquez

Along with the progress of AI democratization, machine learning (ML) has been successfully applied to edge applications, such as smart phones and automated driving. Nowadays, more applications require ML on tiny devices with extremely…

Machine Learning · Computer Science 2021-11-15 Yuhong Song , Edwin Hsing-Mean Sha , Qingfeng Zhuge , Rui Xu , Yongzhuo Zhang , Bingzhe Li , Lei Yang

Machine learning (ML) is the field of training machines to achieve high level of cognition and perform human-like analysis. Since ML is a data-driven approach, it seemingly fits into our daily lives and operations as well as complex and…

Machine Learning · Computer Science 2021-11-25 M. Z. Naser , Amir Alavi

Hyperparameters in machine learning (ML) have received a fair amount of attention, and hyperparameter tuning has come to be regarded as an important step in the ML pipeline. But just how useful is said tuning? While smaller-scale…

Machine Learning · Computer Science 2022-09-05 Moshe Sipper

Large language model (LLM) training today runs on clusters spanning thousands of GPUs. While this scale enables rapid model advances, developing, debugging, and performance-tuning the training framework inevitably becomes complex and…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-18 Shaoke Xi , ChonLam Lao , Boyi Jia , Jiaqi Gao , Zhipeng Zhang , Jiamin Cao , Brian Sutioso , Erci Xu , Minlan Yu , Kui Ren , Yong Li , Zhengping Qian , Ennan Zhai , Jingren Zhou

We investigate large language model performance across five orders of magnitude of compute scaling in eleven recent model architectures. We show that average benchmark performance, aggregating over many individual tasks and evaluations as…

Machine Learning · Computer Science 2024-01-11 David Owen

Performance modeling, a pivotal domain in program cost analysis, currently relies on manually crafted models constrained by various program and hardware limitations, especially in the intricate landscape of GPGPU. Meanwhile, Large Language…

Performance · Computer Science 2025-03-17 Khoi N. M. Nguyen , Hoang Duy Nguyen Do , Huyen Thao Le , Thanh Tuan Dao

Traditional logic programming relies on symbolic computation on the CPU, which can limit performance for large-scale inference tasks. Recent advances in GPU hardware enable high-throughput matrix operations, motivating a shift toward…

Symbolic Computation · Computer Science 2025-08-20 Lun Ai

This dissertation introduces measurement-based performance modeling and prediction techniques for dense linear algebra algorithms. As a core principle, these techniques avoid executions of such algorithms entirely, and instead predict their…

Performance · Computer Science 2017-06-06 Elmar Peise

As deep learning models in agentic AI systems grow in scale and complexity, GPU memory requirements increase and often exceed the available GPU memory capacity, so that out-of-memory (OoM) errors occur. It is well known that OoM interrupts…

Machine Learning · Computer Science 2025-12-10 Jinwoo Jeong , Minchul Kang , Younghun Go , Changyong Shin , Hyunho Lee , Junho Yoon , Gyeongsik Yang , Chuck Yoo

Parameterizable machine learning (ML) accelerators are the product of recent breakthroughs in ML. To fully enable their design space exploration (DSE), we propose a physical-design-driven, learning-based prediction framework for…

‹ Prev 1 2 3 10 Next ›