English
Related papers

Related papers: Performance Debugging through Microarchitectural S…

200 papers

Modern Out-of-Order (OoO) CPUs are complex systems with many components interleaved in non-trivial ways. Pinpointing performance bottlenecks and understanding the underlying causes of program performance issues are critical tasks to make…

Performance · Computer Science 2024-02-27 Hugo Pompougnac , Alban Dutilleul , Christophe Guillon , Nicolas Derumigny , Fabrice Rastello

Bottleneck evaluation plays a crucial part in performance tuning of HPC applications, as it directly influences the search for optimizations and the selection of the best hardware for a given code. In this paper, we introduce a new…

Performance · Computer Science 2025-09-11 Aurélien Delval , Pablo de Oliveira Castro , William Jalby , Etienne Renault

This paper addresses the challenge of understanding the waiting dependencies between the threads and hardware resources required to complete a task. The objective is to improve software performance by detecting the underlying bottlenecks…

Software Engineering · Computer Science 2021-03-09 Naser Ezzati-Jivan , Quentin Fournier , Michel R. Dagenais , Abdelwahab Hamou-Lhadj

Modern microarchitectures are some of the world's most complex man-made systems. As a consequence, it is increasingly difficult to predict, explain, let alone optimize the performance of software running on such microarchitectures. As a…

Performance · Computer Science 2019-03-06 Andreas Abel , Jan Reineke

Modern multi-tenant, hardware-heterogeneous computing environments pose significant challenges for effective workload orchestration. Simple heuristics for assessing workload performance, such as CPU utilization or application-level metrics,…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-27 Oliver Larsson , Thijs Metsch , Cristian Klein , Erik Elmroth

Automatic performance debugging of parallel applications usually involves two steps: automatic detection of performance bottlenecks and uncovering their root causes for performance optimization. Previous work fails to resolve this…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-02-24 Xu Liu , Lin Yuan , Jianfeng Zhan , Bibo Tu , Dan Meng

Diagnosing performance bottlenecks in modern software is essential yet challenging, particularly as applications become more complex and rely on custom resource management policies. While traditional profilers effectively identify execution…

Performance · Computer Science 2025-07-10 Yigong Hu , Haodong Zheng , Yicheng Liu , Dedong Xie , Youliang Huang , Baris Kasikci

We present a new tool, GPA, that can generate key performance measures for very large systems. Based on solving systems of ordinary differential equations (ODEs), this method of performance analysis is far more scalable than stochastic…

Performance · Computer Science 2010-06-29 Anton Stefanek , Richard Hayden , Jeremy Bradley

To efficiently exploit the resources of new many-core architectures, integrating dozens or even hundreds of cores per chip, parallel programming models have evolved to expose massive amounts of parallelism, often in the form of fine-grained…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-05-14 Andi Drebes , Karine Heydemann , Antoniu Pop , Albert Cohen , Nathalie Drach

Performance problems are often observed in embedded software systems. The reasons for poor performance are frequently not obvious. Bottlenecks can occur in any of the software components along the execution path. Therefore it is important…

Software Engineering · Computer Science 2007-05-23 Edu Metz , Raimondas Lencevicius

Processor design validation and debug is a difficult and complex task, which consumes the lion's share of the design process. Design bugs that affect processor performance rather than its functionality are especially difficult to catch,…

Hardware Architecture · Computer Science 2020-11-20 Erick Carvajal Barboza , Sara Jacob , Mahesh Ketkar , Michael Kishinevsky , Paul Gratz , Jiang Hu

Edge computing has emerged as a pivotal technology, offering significant advantages such as low latency, enhanced data security, and reduced reliance on centralized cloud infrastructure. These benefits are crucial for applications requiring…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-24 Tomasz Szydlo , Viacheslav Horbanov , Devki Nandan Jha , Shashikant Ilager , Aleksander Slominski , Rajiv Ranjan

The aim of parallel computing is to increase an application performance by executing the application on multiple processors. OpenMP is an API that supports multi platform shared memory programming model and shared-memory programs are…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-11-12 Vibha Rajput , Alok Katiyar

The proliferation of large language models has driven demand for long-context inference on resource-constrained edge platforms. However, deploying these models on Neural Processing Units (NPUs) presents significant challenges due to…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-18 Neelesh Gupta , Rakshith Jayanth , Dhruv Parikh , Viktor Prasanna

Countless applications cast their computational core in terms of dense linear algebra operations. These operations can usually be implemented by combining the routines offered by standard linear algebra libraries such as BLAS and LAPACK,…

Performance · Computer Science 2014-10-01 Elmar Peise , Paolo Bientinesi

Modern applications process massive data volumes that overwhelm the storage and retrieval capabilities of memory systems, making memory the primary performance and energy-efficiency bottleneck of computing systems. Although many…

Hardware Architecture · Computer Science 2026-03-10 Rahul Bera

Large-scale GPU traces play a critical role in identifying performance bottlenecks within heterogeneous High-Performance Computing (HPC) architectures. However, the sheer volume and complexity of a single trace of data make performance…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-22 Ankur Lahiry , Ayush Pokharel , Banooqa Banday , Seth Ockerman , Amal Gueroudji , Mohammad Zaeed , Tanzima Z. Islam , Line Pouchard

Performance debugging in production is a fundamental activity in modern service-based systems. The diagnosis of performance issues is often time-consuming, since it requires thorough inspection of large volumes of traces and performance…

Software Engineering · Computer Science 2023-04-10 Luca Traini , Vittorio Cortellessa

After a machine learning (ML)-based system is deployed, monitoring its performance is important to ensure the safety and effectiveness of the algorithm over time. When an ML algorithm interacts with its environment, the algorithm can affect…

The simple program and multiple data (SPMD) programming model is widely used for both high performance computing and Cloud computing. In this paper, we design and implement an innovative system, AutoAnalyzer, that automates the process of…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-04-01 Xu Liu , Jianfeng Zhan , Kunlin Zhan , Weisong Shi , Lin Yuan , Dan Meng , Lei Wang
‹ Prev 1 2 3 10 Next ›