Related papers: A Roofline Visualization Framework
This paper presents a practical methodology for collecting performance data necessary to conduct hierarchical Roofline analysis on NVIDIA GPUs. It discusses the extension of the Empirical Roofline Toolkit for broader support of a range of…
This paper surveys a range of methods to collect necessary performance data on Intel CPUs and NVIDIA GPUs for hierarchical Roofline analysis. As of mid-2020, two vendor performance tools, Intel Advisor and NVIDIA Nsight Compute, have…
In this short paper, we introduce the Ridgeline model, an extension of the Roofline model [4] for distributed systems. The Roofline model targets shared memory systems, bounding the performance of a kernel based on its operational…
Deep learning applications are usually very compute-intensive and require a long run time for training and inference. This has been tackled by researchers from both hardware and software sides, and in this paper, we propose a Roofline-based…
The rapidly growing importance of Machine Learning (ML) applications, coupled with their ever-increasing model size and inference energy footprint, has created a strong need for specialized ML hardware architectures. Numerous ML…
We present a bottleneck analysis tool for designing compute systems for autonomous Unmanned Aerial Vehicles (UAV). The tool provides insights by exploiting the fundamental relationships between various components in the autonomous UAV such…
Energy consumption has been a great deal of concern in recent years and developers need to take energy-efficiency into account when they design algorithms. Their design needs to be energy-efficient and low-power while it tries to achieve…
Software visualization seeks to represent software artifacts graphical-ly in two or three dimensions, with the goal of enhancing comprehension, anal-ysis, maintenance, and evolution of the source code. In this context, visualiza-tions…
As RISC-V architectures proliferate across embedded and high-performance domains, developers face persistent challenges in performance optimization due to fragmented tooling, immature hardware features, and platform-specific defects. This…
We introduce an early-phase bottleneck analysis and characterization model called the F-1 for designing computing systems that target autonomous Unmanned Aerial Vehicles (UAVs). The model provides insights by exploiting the fundamental…
A robust evaluation toolset has been designed for Naval Research Laboratory's Real-Time Ocean Forecasting System RELO with the purpose of facilitating an adaptive sampling strategy and providing more educated guidance for routing underwater…
Due to the recent announcement of the Frontier supercomputer, many scientific application developers are working to make their applications compatible with AMD architectures (CPU-GPU), which means moving away from the traditional CPU and…
Peak performance metrics published by vendors often do not correspond to what can be achieved in practice. It is therefore of great interest to do extensive benchmarking on core applications and library routines. Since DGEMM is one of the…
This paper proposes a framework for developing forecasting models by streamlining the connections between core components of the developmental process. The proposed framework enables swift and robust integration of new datasets,…
Interactive data visualization is a major part of modern exploratory data analysis, with web-based technologies enabling a rich ecosystem of both specialized and general tools. However, current visualization tools often lack support for…
We propose a novel and flexible roof modeling approach that can be used for constructing planar 3D polygon roof meshes. Our method uses a graph structure to encode roof topology and enforces the roof validity by optimizing a simple but…
Processing large amounts of data to extract useful information is an essential task within companies. To help in this task, visualization techniques have been commonly used due to their capacity to present data in synthesized views, easier…
There are many science applications that require scalable task-level parallelism and support for flexible execution and coupling of ensembles of simulations. Most high-performance system software and middleware, however, are designed to…
The aim of this paper is to develop an approach to visualizations that benefits from distributed computing. Three schemes of process distribution are considered: parallel, pipeline, and expanding pipeline computations. Expanding pipeline…
Any data analysis, especially the data sets that may be changing often or in real-time, consists of at least three important synchronized components: i) figuring out what to infer (objectives), ii) analysis or computation of objectives, and…