English
Related papers

Related papers: Time-Based Roofline for Deep Learning Performance …

200 papers

This paper presents a practical methodology for collecting performance data necessary to conduct hierarchical Roofline analysis on NVIDIA GPUs. It discusses the extension of the Empirical Roofline Toolkit for broader support of a range of…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-26 Charlene Yang , Yunsong Wang , Steven Farrell , Thorsten Kurth , Samuel Williams

In this paper We present a methodology for creating Roofline models automatically for Non-Unified Memory Access (NUMA) using Intel Xeon as an example. Finally, we present an evaluation of highly efficient deep learning primitives as…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-24 Jacek Czaja , Michal Gallus , Joanna Wozna , Adam Grygielski , Luo Tao

In this short paper, we introduce the Ridgeline model, an extension of the Roofline model [4] for distributed systems. The Roofline model targets shared memory systems, bounding the performance of a kernel based on its operational…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-18 Fabio Checconi , Jesmin Jahan Tithi , Fabrizio Petrini

We introduce the Sparsity Roofline, a visual performance model for evaluating sparsity in neural networks. The Sparsity Roofline jointly models network accuracy, sparsity, and theoretical inference speedup. Our approach does not require…

Computer Vision and Pattern Recognition · Computer Science 2023-11-08 Cameron Shinn , Collin McCarthy , Saurav Muralidharan , Muhammad Osama , John D. Owens

Learning effective configurations in computer systems without hand-crafting models for every parameter is a long-standing problem. This paper investigates the use of deep reinforcement learning for runtime parameters of cloud databases…

Machine Learning · Computer Science 2016-11-01 Michael Schaarschmidt , Felix Gessert , Valentin Dalibard , Eiko Yoneki

The rapidly growing importance of Machine Learning (ML) applications, coupled with their ever-increasing model size and inference energy footprint, has created a strong need for specialized ML hardware architectures. Numerous ML…

Hardware Architecture · Computer Science 2025-05-26 Marian Verhelst , Luca Benini , Naveen Verma

Performance modelling of a deep learning application is essential to improve and quantify the efficiency of the model framework. However, existing performance models are mostly case-specific, with limited capability for the new deep…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-22 Tulasi Kavarakuntla , Liangxiu Han , Huw Lloyd , Annabel Latham , Anthony Kleerekoper , Samson B. Akintoye

Deep learning applications are computation-intensive and often employ GPU as the underlying computing devices. Deep learning frameworks provide powerful programming interfaces, but the gap between source codes and practical GPU operations…

Software Engineering · Computer Science 2017-07-13 Jiazhen Gu , Huan Liu , Yangfan Zhou , Xin Wang

Deep learning models are widely used across computer vision and other domains. When working on the model induction, selecting the right architecture for a given dataset often relies on repetitive trial-and-error procedures. This procedure…

Machine Learning · Computer Science 2026-01-06 Yen-Chia Chen , Hsing-Kuo Pao , Hanjuan Huang

This paper surveys a range of methods to collect necessary performance data on Intel CPUs and NVIDIA GPUs for hierarchical Roofline analysis. As of mid-2020, two vendor performance tools, Intel Advisor and NVIDIA Nsight Compute, have…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-06 Charlene Yang

Deep-learning-based video processing has yielded transformative results in recent years. However, the video analytics pipeline is energy-intensive due to high data rates and reliance on complex inference algorithms, which limits its…

Computer Vision and Pattern Recognition · Computer Science 2021-05-04 Yingying Zhao , Mingzhi Dong , Yujiang Wang , Da Feng , Qin Lv , Robert P. Dick , Dongsheng Li , Tun Lu , Ning Gu , Li Shang

Performance optimization of deep learning models is conducted either manually or through automatic architecture search, or a combination of both. On the other hand, their performance strongly depends on the target hardware and how…

Machine Learning · Computer Science 2022-09-23 Vahid Partovi Nia , Alireza Ghaffari , Mahdi Zolnouri , Yvon Savaria

Peak performance metrics published by vendors often do not correspond to what can be achieved in practice. It is therefore of great interest to do extensive benchmarking on core applications and library routines. Since DGEMM is one of the…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-19 Jacob Odgård Tørring , Jan Christian Meyer , Anne C. Elster

Due to the recent announcement of the Frontier supercomputer, many scientific application developers are working to make their applications compatible with AMD architectures (CPU-GPU), which means moving away from the traditional CPU and…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-11-11 Matthew Leinhauser , René Widera , Sergei Bastrakov , Alexander Debus , Michael Bussmann , Sunita Chandrasekaran

Deep learning is pervasive in our daily life, including self-driving cars, virtual assistants, social network services, healthcare services, face recognition, etc. However, deep neural networks demand substantial compute resources during…

Processing data streams arriving at high speed requires the development of models that can provide fast and accurate predictions. Although deep neural networks are the state-of-the-art for many machine learning tasks, their performance in…

Machine Learning · Computer Science 2020-04-07 Pedro Lara-Benítez , Manuel Carranza-García , Francisco Martínez-Álvarez , José C. Riquelme

Deep learning is rapidly becoming a go-to tool for many artificial intelligence problems due to its ability to outperform other approaches and even humans at many problems. Despite its popularity we are still unable to accurately predict…

Machine Learning · Computer Science 2018-11-30 Daniel Justus , John Brennan , Stephen Bonner , Andrew Stephen McGough

The success of modern deep learning is attributed to two key elements: huge amounts of training data and large model sizes. Where a vast amount of data allows the model to learn more features, the large model architecture boosts the…

Machine Learning · Computer Science 2024-10-08 Muhammad Asif Khan , Ridha Hamila , Hamid Menouar

We introduce Air Learning, an open-source simulator, and a gym environment for deep reinforcement learning research on resource-constrained aerial robots. Equipped with domain randomization, Air Learning exposes a UAV agent to a diverse set…

Robotics · Computer Science 2022-11-15 Srivatsan Krishnan , Behzad Boroujerdian , William Fu , Aleksandra Faust , Vijay Janapa Reddi

Traditional simulations on High-Performance Computing (HPC) systems typically involve modeling very large domains and/or very complex equations. HPC systems allow running large models, but limits in performance increase that have become…

‹ Prev 1 2 3 10 Next ›