English
Related papers

Related papers: LIKWID: Lightweight Performance Tools

200 papers

Exploiting the performance of today's processors requires intimate knowledge of the microarchitecture as well as an awareness of the ever-growing complexity in thread and cache topology. LIKWID is a set of command-line utilities that…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-07-01 Jan Treibig , Georg Hager , Gerhard Wellein

System monitoring is an established tool to measure the utilization and health of HPC systems. Usually system monitoring infrastructures make no connection to job information and do not utilize hardware performance monitoring (HPM) data. To…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-19 Thomas Röhl , Jan Eitzinger , Georg Hager , Gerhard Wellein

Despite the fact that computational fluid dynamics (CFD) software is now (relatively) fast and freely available, it is still amazingly difficult to use. Inaccessible software imposes a significant entry barrier on students and junior…

Computational Physics · Physics 2015-10-26 Gabriel D. Weymouth

Comprehending the performance bottlenecks at the core of the intricate hardware-software interactions exhibited by highly parallel programs on HPC clusters is crucial. This paper sheds light on the issue of automatically asynchronous MPI…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-09-06 Ayesha Afzal , Georg Hager , Stefano Markidis , Gerhard Wellein

Dynamic program analysis is invaluable for malware detection, debugging, and performance profiling. However, software-based instrumentation incurs high overhead and can be evaded by anti-analysis techniques. In this paper, we propose…

Cryptography and Security · Computer Science 2025-10-21 Changyu Zhao , Yohan Beugin , Jean-Charles Noirot Ferrand , Quinn Burke , Guancheng Li , Patrick McDaniel

Estimating instruction-level throughput is critical for many applications: multimedia, low-latency networking, medical, automotive, avionic, and industrial control systems all rely on tightly calculable and accurate timing bounds of their…

Programming Languages · Computer Science 2023-05-18 Min-Yih Hsu , Felicitas Hetzelt , David Gens , Michael Maitland , Michael Franz

Despite the de-facto technological uniformity fostered by the cloud and edge computing paradigms, resource fragmentation across isolated clusters hinders the dynamism in application placement, leading to suboptimal performance and…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-12-26 Marco Iorio , Fulvio Risso , Alex Palesandro , Leonardo Camiciotti , Antonio Manzalini

Containerization has emerged as a revolutionary technology in the software development and deployment industry. Containers offer a portable and lightweight solution that allows for packaging applications and their dependencies…

Cryptography and Security · Computer Science 2024-05-14 Md Sadun Haq , Ali Saman Tosun , Turgay Korkmaz

A processor's memory hierarchy has a major impact on the performance of running code. However, computing platforms, where the actual hardware characteristics are hidden from both the end user and the tools that mediate execution, such as a…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-10 Keith Cooper , Xiaoran Xu

We investigate the utility of augmenting a microprocessor with a single execution pipeline by adding a second copy of the execution pipeline in parallel with the existing one. The resulting dual-hardware-threaded microprocessor has two…

Hardware Architecture · Computer Science 2023-05-30 Madhav P. Desai

Nowadays, latency-critical, high-performance applications are parallelized even on power-constrained client systems to improve performance. However, an important scenario of fine-grained tasking on simultaneous multithreading CPU cores in…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-03 Denis Los , Igor Petushkov

Serverless computing has emerged as a pivotal paradigm for deploying Deep Learning (DL) models, offering automatic scaling and cost efficiency. However, the inherent cold start problem in serverless ML inference systems, particularly the…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-01 Z. Wu , Y. Deng , J. Hu , L. Cui , Z. Zhang , L. Zeng , G. Min

This paper presents the Container Profiler, a software tool that measures and records the resource usage of any containerized task. Our tool profiles the CPU, memory, disk, and network utilization of containerized tasks collecting over…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-02-08 Varik Hoang , Ling-Hong Hung , David Perez , Huazeng Deng , Raymond Schooley , Niharika Arumilli , Ka Yee Yeung , Wes Lloyd

Many tools and libraries employ hardware performance monitoring (HPM) on modern processors, and using this data for performance assessment and as a starting point for code optimizations is very popular. However, such data is only useful if…

Performance · Computer Science 2013-02-20 Jan Treibig , Georg Hager , Gerhard Wellein

The key to speeding up applications is often understanding where the elapsed time is spent, and why. This document reviews in depth the full array of performance analysis tools and techniques available on Linux for this task, from the…

Performance · Computer Science 2007-05-23 Michel R. Dagenais , Karim Yaghmour , Charles Levert , Makan Pourzandi

To support growing massive parallelism, functional components and also the capabilities of current processors are changing and continue to do so. Todays computers are built upon multiple processing cores and run applications consisting of a…

Programming Languages · Computer Science 2016-04-07 Somnath Mazumdar , Roberto Giorgi

Modern large-scale scientific discovery requires multidisciplinary collaboration across diverse computing facilities, including High Performance Computing (HPC) machines and the Edge-to-Cloud continuum. Integrated data analysis plays a…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-21 Renan Souza , Tyler J. Skluzacek , Sean R. Wilkinson , Maxim Ziatdinov , Rafael Ferreira da Silva

There is a growing interest in the development of lidar-based autonomous mobility and Intelligent Transportation Systems (ITS). To operate and research on lidar data, researchers often develop code specific to application niche. This…

Computer Vision and Pattern Recognition · Computer Science 2025-09-04 Muhammad Shahbaz , Shaurya Agarwal

Performance analysis is a critical step in the oft-repeated, iterative process of performance tuning of parallel programs. Per-process, per-thread traces (detailed logs of events with timestamps) enable in-depth analysis of parallel program…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-15 Abhinav Bhatele , Rakrish Dhakal , Alexander Movsesyan , Aditya K. Ranjan , Onur Cankur

Applications to process seismic data employ scalable parallel systems to produce timely results. To fully exploit emerging processor architectures, application will need to employ threaded parallelism within a node and message passing…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-03-15 Sri Raj Paul , John Mellor-Crummey , Mauricio Araya-Polo , Detlef Hohl
‹ Prev 1 2 3 10 Next ›