Related papers: SPARK00: A Benchmark Package for the Compiler Eval…
Optimizing scientific applications to take full advan-tage of modern memory subsystems is a continual challenge forapplication and compiler developers. Factors beyond working setsize affect performance. A benchmark framework that…
Context: Performance regressions negatively impact execution time and memory usage of software systems. Nevertheless, there is a lack of systematic methods to evaluate the effectiveness of performance test suites. Performance mutation…
As high-performance computing (HPC) systems rapidly evolve, with increasing on-node parallelism and widespread use of accelerators, understanding how the code maps to hardware is essential for reaching optimal performance. Benchmarks are…
Throughput-oriented computing via co-running multiple applications in the same machine has been widely adopted to achieve high hardware utilization and energy saving on modern supercomputers and data centers. However, efficiently co-running…
Irregular memory access patterns pose performance and user productivity challenges on distributed-memory systems. They can lead to fine-grained remote communication and the data access patterns are often not known until runtime. The…
Machine learning (ML) models are widely used in many important domains. For efficiently processing these computational- and memory-intensive applications, tensors of these over-parameterized models are compressed by leveraging sparsity,…
This paper describes a new benchmark tool, Spatter, for assessing memory system architectures in the context of a specific category of indexed accesses known as gather and scatter. These types of operations are increasingly used to express…
Despite extensive testing and correctness certification of their functional semantics, a number of compiler optimizations have been shown to violate security guarantees implemented in source code. While prior work has shed light on how such…
For scientific software, especially those used for large-scale simulations, achieving good performance and efficiently using the available hardware resources is essential. It is important to regularly perform benchmarks to ensure the…
Some OpenMP multi-threaded applications increasingly suffer from performance anomaly owning to shared resource contention as well as software- and hardware-related problems. Such performance anomaly can result in failure and inefficiencies,…
On the way to Exascale, programmers face the increasing challenge of having to support multiple hardware architectures from the same code base. At the same time, portability of code and performance are increasingly difficult to achieve as…
This report presents the design of the Scope infrastructure for extensible and portable benchmarking. Improvements in high- performance computing systems rely on coordination across different levels of system abstraction. Developing and…
Irregular applications comprise an increasingly important workload domain for many fields, including bioinformatics, chemistry, physics, social sciences and machine learning. Therefore, achieving high performance and energy efficiency in…
Leveraging spatial sparsity has become a popular approach to accelerate 3D computer graphics applications. Spatially sparse data structures and efficient sparse kernels (such as parallel stencil operations on active voxels), are key to…
As computing system become more complex, it is becoming harder for programmers to keep their codes optimized as the hardware gets updated. Autotuners try to alleviate this by hiding as many architecture-based optimization details as…
Recurrence equations lie at the heart of many computational paradigms including dynamic programming, graph analysis, and linear solvers. These equations are often expensive to compute and much work has gone into optimizing them for…
Sparse matrix computation is crucial in various modern applications, including large-scale graph analytics, deep learning, and recommender systems. The performance of sparse kernels varies greatly depending on the structure of the input…
Modern Out-of-Order (OoO) CPUs are complex systems with many components interleaved in non-trivial ways. Pinpointing performance bottlenecks and understanding the underlying causes of program performance issues are critical tasks to fully…
Analyzing IO performance anomalies is a crucial task in various computing environments, ranging from large-scale cloud applications to desktop applications. However, the IO stack of modern operating systems is complicated, making it hard to…
Tensor computations present significant performance challenges that impact a wide spectrum of applications ranging from machine learning, healthcare analytics, social network analysis, data mining to quantum chemistry and signal processing.…