Related papers: Performance Debugging through Microarchitectural S…

Performance bottlenecks detection through microarchitectural sensitivity

Modern Out-of-Order (OoO) CPUs are complex systems with many components interleaved in non-trivial ways. Pinpointing performance bottlenecks and understanding the underlying causes of program performance issues are critical tasks to make…

Performance · Computer Science 2024-02-27 Hugo Pompougnac , Alban Dutilleul , Christophe Guillon , Nicolas Derumigny , Fabrice Rastello

Noise Injection for__Performance Bottleneck Analysis

Bottleneck evaluation plays a crucial part in performance tuning of HPC applications, as it directly influences the search for optimizations and the selection of the best hardware for a given code. In this paper, we introduce a new…

Performance · Computer Science 2025-09-11 Aurélien Delval , Pablo de Oliveira Castro , William Jalby , Etienne Renault

DepGraph: Localizing Performance Bottlenecks in Multi-Core Applications Using Waiting Dependency Graphs and Software Tracing

This paper addresses the challenge of understanding the waiting dependencies between the threads and hardware resources required to complete a task. The objective is to improve software performance by detecting the underlying bottlenecks…

Software Engineering · Computer Science 2021-03-09 Naser Ezzati-Jivan , Quentin Fournier , Michel R. Dagenais , Abdelwahab Hamou-Lhadj

uops.info: Characterizing Latency, Throughput, and Port Usage of Instructions on Intel Microarchitectures

Modern microarchitectures are some of the world's most complex man-made systems. As a consequence, it is increasingly difficult to predict, explain, let alone optimize the performance of software running on such microarchitectures. As a…

Performance · Computer Science 2019-03-06 Andreas Abel , Jan Reineke

Workload Buoyancy: Keeping Apps Afloat by Identifying Shared Resource Bottlenecks

Modern multi-tenant, hardware-heterogeneous computing environments pose significant challenges for effective workload orchestration. Simple heuristics for assessing workload performance, such as CPU utilization or application-level metrics,…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-27 Oliver Larsson , Thijs Metsch , Cristian Klein , Erik Elmroth

Automatic Performance Debugging of SPMD Parallel Programs

Automatic performance debugging of parallel applications usually involves two steps: automatic detection of performance bottlenecks and uncovering their root causes for performance optimization. Previous work fails to resolve this…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-02-24 Xu Liu , Lin Yuan , Jianfeng Zhan , Bibo Tu , Dan Meng

gigiProfiler: Diagnosing Performance Issues by Uncovering Application Resource Bottlenecks

Diagnosing performance bottlenecks in modern software is essential yet challenging, particularly as applications become more complex and rely on custom resource management policies. While traditional profilers effectively identify execution…

Performance · Computer Science 2025-07-10 Yigong Hu , Haodong Zheng , Yicheng Liu , Dedong Xie , Youliang Huang , Baris Kasikci

A new tool for the performance analysis of massively parallel computer systems

We present a new tool, GPA, that can generate key performance measures for very large systems. Based on solving systems of ordinary differential equations (ODEs), this method of performance analysis is far more scalable than stochastic…

Performance · Computer Science 2010-06-29 Anton Stefanek , Richard Hayden , Jeremy Bradley

Automatic Detection of Performance Anomalies in Task-Parallel Programs

To efficiently exploit the resources of new many-core architectures, integrating dozens or even hundreds of cores per chip, parallel programming models have evolved to expose massive amounts of parallelism, often in the form of fine-grained…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-05-14 Andi Drebes , Karine Heydemann , Antoniu Pop , Albert Cohen , Nathalie Drach

A Performance Analysis Tool for Nokia Mobile Phone Software

Performance problems are often observed in embedded software systems. The reasons for poor performance are frequently not obvious. Bottlenecks can occur in any of the software components along the execution path. Therefore it is important…

Software Engineering · Computer Science 2007-05-23 Edu Metz , Raimondas Lencevicius

Automatic Microprocessor Performance Bug Detection

Processor design validation and debug is a difficult and complex task, which consumes the lion's share of the design process. Design bugs that affect processor performance rather than its functionality are especially difficult to catch,…

Hardware Architecture · Computer Science 2020-11-20 Erick Carvajal Barboza , Sara Jacob , Mahesh Ketkar , Michael Kishinevsky , Paul Gratz , Jiang Hu

Benchmarking of CPU-intensive Stream Data Processing in The Edge Computing Systems

Edge computing has emerged as a pivotal technology, offering significant advantages such as low latency, enhanced data security, and reduced reliance on centralized cloud infrastructure. These benefits are crucial for applications requiring…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-24 Tomasz Szydlo , Viacheslav Horbanov , Devki Nandan Jha , Shashikant Ilager , Aleksander Slominski , Rajiv Ranjan

Proactive bottleneck performance analysis in parallel computing using openMP

The aim of parallel computing is to increase an application performance by executing the application on multiple processors. OpenMP is an API that supports multi platform shared memory programming model and shared-memory programs are…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-11-12 Vibha Rajput , Alok Katiyar

Context-Driven Performance Modeling for Causal Inference Operators on Neural Processing Units

The proliferation of large language models has driven demand for long-context inference on resource-constrained edge platforms. However, deploying these models on Neural Processing Units (NPUs) presents significant challenges due to…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-18 Neelesh Gupta , Rakshith Jayanth , Dhruv Parikh , Viktor Prasanna

Cache-aware Performance Modeling and Prediction for Dense Linear Algebra

Countless applications cast their computational core in terms of dense linear algebra operations. These operations can usually be implemented by combining the routines offered by standard linear algebra libraries such as BLAS and LAPACK,…

Performance · Computer Science 2014-10-01 Elmar Peise , Paolo Bientinesi

Mitigating the Memory Bottleneck with Machine Learning-Driven and Data-Aware Microarchitectural Techniques

Modern applications process massive data volumes that overwhelm the storage and retrieval capabilities of memory systems, making memory the primary performance and energy-efficiency bottleneck of computing systems. Although many…

Hardware Architecture · Computer Science 2026-03-10 Rahul Bera

A Distributed Framework for Causal Modeling of Performance Variability in GPU Traces

Large-scale GPU traces play a critical role in identifying performance bottlenecks within heterogeneous High-Performance Computing (HPC) architectures. However, the sheer volume and complexity of a single trace of data make performance…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-22 Ankur Lahiry , Ayush Pokharel , Banooqa Banday , Seth Ockerman , Amal Gueroudji , Mohammad Zaeed , Tanzima Z. Islam , Line Pouchard

DeLag: Using Multi-Objective Optimization to Enhance the Detection of Latency Degradation Patterns in Service-based Systems

Performance debugging in production is a fundamental activity in modern service-based systems. The diagnosis of performance issues is often time-consuming, since it requires thorough inspection of large volumes of traces and performance…

Software Engineering · Computer Science 2023-04-10 Luca Traini , Vittorio Cortellessa

Designing monitoring strategies for deployed machine learning algorithms: navigating performativity through a causal lens

After a machine learning (ML)-based system is deployed, monitoring its performance is important to ensure the safety and effectiveness of the algorithm over time. When an ML algorithm interacts with its environment, the algorithm can affect…

Machine Learning · Computer Science 2024-02-27 Jean Feng , Adarsh Subbaswamy , Alexej Gossmann , Harvineet Singh , Berkman Sahiner , Mi-Ok Kim , Gene Pennello , Nicholas Petrick , Romain Pirracchio , Fan Xia

Automatic Performance Debugging of SPMD-style Parallel Programs

The simple program and multiple data (SPMD) programming model is widely used for both high performance computing and Cloud computing. In this paper, we design and implement an innovative system, AutoAnalyzer, that automates the process of…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-04-01 Xu Liu , Jianfeng Zhan , Kunlin Zhan , Weisong Shi , Lin Yuan , Dan Meng , Lei Wang