Related papers: Machine Learning Framwork for Performance Anomaly …
This paper presents a novel heartbeat diagnosis regarding performance anomaly for OpenMP multi-threaded applications. First, we design injected heartbeat APIs for OpenMP multi-threaded applications. Then, we leverage the heartbeat sequences…
The aim of parallel computing is to increase an application performance by executing the application on multiple processors. OpenMP is an API that supports multi platform shared memory programming model and shared-memory programs are…
In the high performance computing (HPC) domain, performance variability is a major scalability issue for parallel computing applications with heavy synchronization and communication. In this paper, we present an experimental performance…
The current trend of multicore architectures on shared memory systems underscores the need of parallelism. While there are some programming model to express parallelism, thread programming model has become a standard to support these system…
Asymmetric multicore processors (AMPs) couple high-performance big cores and low-power small cores with the same instruction-set architecture but different features, such as clock frequency or microarchitecture. Previous work has shown that…
We present a framework based on Catch2 to evaluate performance of OpenMP's target offload model via micro-benchmarks. The compilers supporting OpenMP's target offload model for heterogeneous architectures are currently undergoing rapid…
In financial field, a robust software system is of vital importance to ensure the smooth operation of financial transactions. However, many financial corporations still depend on operators to identify and eliminate the system failures when…
Parallel processing is considered as todays and future trend for improving performance of computers. Computing devices ranging from small embedded systems to big clusters of computers rely on parallelizing applications to reduce execution…
Optimizing scientific applications to take full advan-tage of modern memory subsystems is a continual challenge forapplication and compiler developers. Factors beyond working setsize affect performance. A benchmark framework that…
OpenMP has been the de facto standard for single node parallelism for more than a decade. Recently, asynchronous many-task runtime (AMT) systems have increased in popularity as a new programming paradigm for high performance computing…
Due to the veracity and heterogeneity in network traffic, detecting anomalous events is challenging. The computational load on global servers is a significant challenge in terms of efficiency, accuracy, and scalability. Our primary…
In this paper we explore the performance limits of Apache Spark for machine learning applications. We begin by analyzing the characteristics of a state-of-the-art distributed machine learning algorithm implemented in Spark and compare it to…
Heterogeneity has become a mainstream architecture design choice for building High Performance Computing systems. However, heterogeneity poses significant challenges for achieving performance portability of execution. Adapting a program to…
As modern software systems continue to grow in terms of complexity and volume, anomaly detection on multivariate monitoring metrics, which profile systems' health status, becomes more and more critical and challenging. In particular, the…
The trend towards highly parallel multi-processing is ubiquitous in all modern computer architectures, ranging from handheld devices to large-scale HPC systems; yet many applications are struggling to fully utilise the multiple levels of…
With the increasing number of Quad-Core-based clusters and the introduction of compute nodes designed with large memory capacity shared by multiple cores, new problems related to scalability arise. In this paper, we analyze the overall…
State-of-the-art machine learning frameworks support a wide variety of design features to enable a flexible machine learning programming interface and to ease the programmability burden on machine learning developers. Identifying and using…
We describe a universal modeling approach for predicting single- and multicore runtime of steady-state loops on server processors. To this end we strictly differentiate between application and machine models: An application model comprises…
In this paper we describe an autotuning tool for optimization of OpenMP applications on highly multicore and multithreaded architectures. Our work was motivated by in-depth performance analysis of scientific applications and synthetic…
In light of continued advances in loop scheduling, this work revisits the OpenMP loop scheduling by outlining the current state of the art in loop scheduling and presenting evidence that the existing OpenMP schedules are insufficient for…