Related papers: Time-Based Roofline for Deep Learning Performance …

Hierarchical Roofline Performance Analysis for Deep Learning Applications

This paper presents a practical methodology for collecting performance data necessary to conduct hierarchical Roofline analysis on NVIDIA GPUs. It discusses the extension of the Empirical Roofline Toolkit for broader support of a range of…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-26 Charlene Yang , Yunsong Wang , Steven Farrell , Thorsten Kurth , Samuel Williams

Applying the Roofline model for Deep Learning performance optimizations

In this paper We present a methodology for creating Roofline models automatically for Non-Unified Memory Access (NUMA) using Intel Xeon as an example. Finally, we present an evaluation of highly efficient deep learning primitives as…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-24 Jacek Czaja , Michal Gallus , Joanna Wozna , Adam Grygielski , Luo Tao

Ridgeline: A 2D Roofline Model for Distributed Systems

In this short paper, we introduce the Ridgeline model, an extension of the Roofline model [4] for distributed systems. The Roofline model targets shared memory systems, bounding the performance of a kernel based on its operational…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-18 Fabio Checconi , Jesmin Jahan Tithi , Fabrizio Petrini

The Sparsity Roofline: Understanding the Hardware Limits of Sparse Neural Networks

We introduce the Sparsity Roofline, a visual performance model for evaluating sparsity in neural networks. The Sparsity Roofline jointly models network accuracy, sparsity, and theoretical inference speedup. Our approach does not require…

Computer Vision and Pattern Recognition · Computer Science 2023-11-08 Cameron Shinn , Collin McCarthy , Saurav Muralidharan , Muhammad Osama , John D. Owens

Learning Runtime Parameters in Computer Systems with Delayed Experience Injection

Learning effective configurations in computer systems without hand-crafting models for every parameter is a long-standing problem. This paper investigates the use of deep reinforcement learning for runtime parameters of cloud databases…

Machine Learning · Computer Science 2016-11-01 Michael Schaarschmidt , Felix Gessert , Valentin Dalibard , Eiko Yoneki

How to keep pushing ML accelerator performance? Know your rooflines!

The rapidly growing importance of Machine Learning (ML) applications, coupled with their ever-increasing model size and inference energy footprint, has created a strong need for specialized ML hardware architectures. Numerous ML…

Hardware Architecture · Computer Science 2025-05-26 Marian Verhelst , Luca Benini , Naveen Verma

A Generic Performance Model for Deep Learning in a Distributed Environment

Performance modelling of a deep learning application is essential to improve and quantify the efficiency of the model framework. However, existing performance models are mostly case-specific, with limited capability for the new deep…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-22 Tulasi Kavarakuntla , Liangxiu Han , Huw Lloyd , Annabel Latham , Anthony Kleerekoper , Samson B. Akintoye

DeepProf: Performance Analysis for Deep Learning Applications via Mining GPU Execution Patterns

Deep learning applications are computation-intensive and often employ GPU as the underlying computing devices. Deep learning frameworks provide powerful programming interfaces, but the gap between source codes and practical GPU operations…

Software Engineering · Computer Science 2017-07-13 Jiazhen Gu , Huan Liu , Yangfan Zhou , Xin Wang

Data Complexity-aware Deep Model Performance Forecasting

Deep learning models are widely used across computer vision and other domains. When working on the model induction, selecting the right architecture for a given dataset often relies on repetitive trial-and-error procedures. This procedure…

Machine Learning · Computer Science 2026-01-06 Yen-Chia Chen , Hsing-Kuo Pao , Hanjuan Huang

Hierarchical Roofline Analysis: How to Collect Data using Performance Tools on Intel CPUs and NVIDIA GPUs

This paper surveys a range of methods to collect necessary performance data on Intel CPUs and NVIDIA GPUs for hierarchical Roofline analysis. As of mid-2020, two vendor performance tools, Intel Advisor and NVIDIA Nsight Compute, have…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-06 Charlene Yang

A Reinforcement-Learning-Based Energy-Efficient Framework for Multi-Task Video Analytics Pipeline

Deep-learning-based video processing has yielded transformative results in recent years. However, the video analytics pipeline is energy-intensive due to high data rates and reliance on complex inference algorithms, which limits its…

Computer Vision and Pattern Recognition · Computer Science 2021-05-04 Yingying Zhao , Mingzhi Dong , Yujiang Wang , Da Feng , Qin Lv , Robert P. Dick , Dongsheng Li , Tun Lu , Ning Gu , Li Shang

Rethinking Pareto Frontier for Performance Evaluation of Deep Neural Networks

Performance optimization of deep learning models is conducted either manually or through automatic architecture search, or a combination of both. On the other hand, their performance strongly depends on the target hardware and how…

Machine Learning · Computer Science 2022-09-23 Vahid Partovi Nia , Alireza Ghaffari , Mahdi Zolnouri , Yvon Savaria

Autotuning Benchmarking Techniques: A Roofline Model Case Study

Peak performance metrics published by vendors often do not correspond to what can be achieved in practice. It is therefore of great interest to do extensive benchmarking on core applications and library routines. Since DGEMM is one of the…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-19 Jacob Odgård Tørring , Jan Christian Meyer , Anne C. Elster

Metrics and Design of an Instruction Roofline Model for AMD GPUs

Due to the recent announcement of the Frontier supercomputer, many scientific application developers are working to make their applications compatible with AMD architectures (CPU-GPU), which means moving away from the traditional CPU and…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-11-11 Matthew Leinhauser , René Widera , Sergei Bastrakov , Alexander Debus , Michael Bussmann , Sunita Chandrasekaran

Resource-Efficient Deep Learning: A Survey on Model-, Arithmetic-, and Implementation-Level Techniques

Deep learning is pervasive in our daily life, including self-driving cars, virtual assistants, social network services, healthcare services, face recognition, etc. However, deep neural networks demand substantial compute resources during…

Machine Learning · Computer Science 2024-04-30 JunKyu Lee , Lev Mukhanov , Amir Sabbagh Molahosseini , Umar Minhas , Yang Hua , Jesus Martinez del Rincon , Kiril Dichev , Cheol-Ho Hong , Hans Vandierendonck

On the performance of deep learning models for time series classification in streaming

Processing data streams arriving at high speed requires the development of models that can provide fast and accurate predictions. Although deep neural networks are the state-of-the-art for many machine learning tasks, their performance in…

Machine Learning · Computer Science 2020-04-07 Pedro Lara-Benítez , Manuel Carranza-García , Francisco Martínez-Álvarez , José C. Riquelme

Predicting the Computational Cost of Deep Learning Models

Deep learning is rapidly becoming a go-to tool for many artificial intelligence problems due to its ability to outperform other approaches and even humans at many problems. Despite its popularity we are still unable to accurately predict…

Machine Learning · Computer Science 2018-11-30 Daniel Justus , John Brennan , Stephen Bonner , Andrew Stephen McGough

Accelerating Deep Learning with Fixed Time Budget

The success of modern deep learning is attributed to two key elements: huge amounts of training data and large model sizes. Where a vast amount of data allows the model to learn more features, the large model architecture boosts the…

Machine Learning · Computer Science 2024-10-08 Muhammad Asif Khan , Ridha Hamila , Hamid Menouar

Air Learning: A Deep Reinforcement Learning Gym for Autonomous Aerial Robot Visual Navigation

We introduce Air Learning, an open-source simulator, and a gym environment for deep reinforcement learning research on resource-constrained aerial robots. Equipped with domain randomization, Air Learning exposes a UAV agent to a diverse set…

Robotics · Computer Science 2022-11-15 Srivatsan Krishnan , Behzad Boroujerdian , William Fu , Aleksandra Faust , Vijay Janapa Reddi

Deep-learning enhancement of large scale numerical simulations

Traditional simulations on High-Performance Computing (HPC) systems typically involve modeling very large domains and/or very complex equations. HPC systems allow running large models, but limits in performance increase that have become…

Computational Engineering, Finance, and Science · Computer Science 2020-04-08 Caspar van Leeuwen , Damian Podareanu , Valeriu Codreanu , Maxwell X. Cai , Axel Berg , Simon Portegies Zwart , Robin Stoffer , Menno Veerman , Chiel van Heerwaarden , Sydney Otten , Sascha Caron , Cunliang Geng , Francesco Ambrosetti , Alexandre M. J. J. Bonvin