Related papers: Automatic Tracing in Task-Based Runtime Systems
Detecting performance issues and identifying their root causes in the runtime is a challenging task. Typically, developers use methods such as logging and tracing to identify bottlenecks. These solutions are, however, not ideal as they are…
Execution of concurrent programs implies frequent switching between different thread contexts. This property perplexes analyzing and reasoning about concurrent programs. Trace simplification is a technique that aims at alleviating this…
Modern program runtime is dominated by segments of repeating code called kernels. Kernels are accelerated by increasing memory locality, increasing data-parallelism, and exploiting producer-consumer parallelism among kernels - which…
An advantage of scientific workflow systems is their ability to collect runtime provenance information as an execution trace. Traces include the computation steps invoked as part of the workflow run along with the corresponding data…
To efficiently exploit the resources of new many-core architectures, integrating dozens or even hundreds of cores per chip, parallel programming models have evolved to expose massive amounts of parallelism, often in the form of fine-grained…
This paper addresses the challenge of understanding the waiting dependencies between the threads and hardware resources required to complete a task. The objective is to improve software performance by detecting the underlying bottlenecks…
Software performance modeling plays a crucial role in developing and maintaining software systems. A performance model analytically describes the relationship between the performance of a system and its runtime activities. This process…
Task-based runtime systems provide flexible load balancing and portability for parallel scientific applications, but their strong scaling is highly sensitive to task granularity. As parallelism increases, scheduling overhead may transition…
Scaling a parallel program to modern supercomputers is challenging due to inter-process communication, Amdahl's law, and resource contention. Performance analysis tools for finding such scaling bottlenecks either base on profiling or…
The execution of concurrent programs generally involves some degree of nondeterminism, mostly due to the relative speeds of the concurrent processes. As a consequence, reproducibility is often challenging. This problem has been…
Modern software projects include automated tests written to check the programs' functionality. The set of functions invoked by a test is called the trace of the test, and the action of obtaining a trace is called tracing. There are many…
This article is concerned with automated complexity analysis of term rewrite systems. Since these systems underlie much of declarative programming, time complexity of functions defined by rewrite systems is of particular interest. Among…
Modern software systems have become increasingly complex, which makes them difficult to test and validate. Detecting software partial anomalies in complex systems at runtime can assist with handling unintended software behaviors, avoiding…
Performance analysis is a critical step in the oft-repeated, iterative process of performance tuning of parallel programs. Per-process, per-thread traces (detailed logs of events with timestamps) enable in-depth analysis of parallel program…
There are billions of lines of sequential code inside nowadays' software which do not benefit from the parallelism available in modern multicore architectures. Automatically parallelizing sequential code, to promote an efficient use of the…
Locks have been widely used as an effective synchronization mechanism among processes and threads. However, we observe that a large number of false inter-thread dependencies (i.e., unnecessary lock contentions) exist during the program…
Understanding the behavior of software in execution is a key step in identifying and fixing performance issues. This is especially important in high performance computing contexts where even minor performance tweaks can translate into large…
Real-time embedded systems require precise timing and fault detection to ensure correct behavior. Traditional tracing tools often rely on local desktops with limited processing and storage capabilities, which hampers large-scale analysis.…
Task-based programming models like OmpSs-2 and OpenMP provide a flexible data-flow execution model to exploit dynamic, irregular and nested parallelism. Providing an efficient implementation that scales well with small granularity tasks…
Automatic parallelization improves the performance of serial program by automatically converting to parallel program. Automatic parallelization typically works in three phases: check for data dependencies in the input program, perform…