Related papers: Machine Learning Framwork for Performance Anomaly …

Heartbeat Diagnosis of Performance Anomaly in OpenMP Multi-Threaded Systems

This paper presents a novel heartbeat diagnosis regarding performance anomaly for OpenMP multi-threaded applications. First, we design injected heartbeat APIs for OpenMP multi-threaded applications. Then, we leverage the heartbeat sequences…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-04 Weidong Wang , Wangda Luo

Proactive bottleneck performance analysis in parallel computing using openMP

The aim of parallel computing is to increase an application performance by executing the application on multiple processors. OpenMP is an API that supports multi platform shared memory programming model and shared-memory programs are…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-11-12 Vibha Rajput , Alok Katiyar

Analysis and Characterization of Performance Variability for OpenMP Runtime

In the high performance computing (HPC) domain, performance variability is a major scalability issue for parallel computing applications with heavy synchronization and communication. In this paper, we present an experimental performance…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-10 Minyu Cui , Nikela Papadopoulou , Miquel Pericàs

Performance Evaluation of Parallel Message Passing and Thread Programming Model on Multicore Architectures

The current trend of multicore architectures on shared memory systems underscores the need of parallelism. While there are some programming model to express parallelism, thread programming model has become a standard to support these system…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-12-13 D. T. Hasta , A. B. Mutiara

Enabling performance portability of data-parallel OpenMP applications on asymmetric multicore processors

Asymmetric multicore processors (AMPs) couple high-performance big cores and low-power small cores with the same instruction-set architecture but different features, such as clock frequency or microarchitecture. Previous work has shown that…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-13 Juan Carlos Saez , Fernando Castro , Manuel Prieto-Matias

A Microbenchmark Framework for Performance Evaluation of OpenMP Target Offloading

We present a framework based on Catch2 to evaluate performance of OpenMP's target offload model via micro-benchmarks. The compilers supporting OpenMP's target offload model for heterogeneous architectures are currently undergoing rapid…

Performance · Computer Science 2025-03-04 Mohammad Atif , Tianle Wang , Zhihua Dong , Charles Leggett , Meifeng Lin

An anomaly prediction framework for financial IT systems using hybrid machine learning methods

In financial field, a robust software system is of vital importance to ensure the smooth operation of financial transactions. However, many financial corporations still depend on operators to identify and eliminate the system failures when…

Machine Learning · Computer Science 2019-12-20 Jingwen Wang , Jingxin Liu , Juntao Pu , Qinghong Yang , Zhongchen Miao , Jian Gao , You Song

Towards Efficient OpenMP Strategies for Non-Uniform Architectures

Parallel processing is considered as todays and future trend for improving performance of computers. Computing devices ranging from small embedded systems to big clusters of computers rely on parallelizing applications to reduce execution…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-11-27 Oussama Tahan

AdaptMemBench: Application-Specific MemorySubsystem Benchmarking

Optimizing scientific applications to take full advan-tage of modern memory subsystems is a continual challenge forapplication and compiler developers. Factors beyond working setsize affect performance. A benchmark framework that…

Performance · Computer Science 2018-12-20 Mahesh Lakshminarasimhan , Catherine Olschanowsky

Supporting OpenMP 5.0 Tasks in hpxMP -- A study of an OpenMP implementation within Task Based Runtime Systems

OpenMP has been the de facto standard for single node parallelism for more than a decade. Recently, asynchronous many-task runtime (AMT) systems have increased in popularity as a new programming paradigm for high performance computing…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-02-20 Tianyi Zhang , Shahrzad Shirzad , Bibek Wagle , Adrian S. Lemoine , Patrick Diehl , Hartmut Kaiser

Network Anomaly Detection Using Federated Learning

Due to the veracity and heterogeneity in network traffic, detecting anomalous events is challenging. The computational load on global servers is a significant challenge in terms of efficiency, accuracy, and scalability. Our primary…

Machine Learning · Computer Science 2023-03-15 William Marfo , Deepak K. Tosh , Shirley V. Moore

Understanding and Optimizing the Performance of Distributed Machine Learning Applications on Apache Spark

In this paper we explore the performance limits of Apache Spark for machine learning applications. We begin by analyzing the characteristics of a state-of-the-art distributed machine learning algorithm implemented in Spark and compare it to…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-21 Celestine Dünner , Thomas Parnell , Kubilay Atasu , Manolis Sifalakis , Haralampos Pozidis

Machine Learning-Driven Adaptive OpenMP For Portable Performance on Heterogeneous Systems

Heterogeneity has become a mainstream architecture design choice for building High Performance Computing systems. However, heterogeneity poses significant challenges for achieving performance portability of execution. Adapting a program to…

Programming Languages · Computer Science 2023-03-17 Giorgis Georgakoudis , Konstantinos Parasyris , Chunhua Liao , David Beckingsale , Todd Gamblin , Bronis de Supinski

Practical Anomaly Detection over Multivariate Monitoring Metrics for Online Services

As modern software systems continue to grow in terms of complexity and volume, anomaly detection on multivariate monitoring metrics, which profile systems' health status, becomes more and more critical and challenging. In particular, the…

Software Engineering · Computer Science 2023-08-22 Jinyang Liu , Tianyi Yang , Zhuangbin Chen , Yuxin Su , Cong Feng , Zengyin Yang , Michael R. Lyu

Benchmarking mixed-mode PETSc performance on high-performance architectures

The trend towards highly parallel multi-processing is ubiquitous in all modern computer architectures, ranging from handheld devices to large-scale HPC systems; yet many applications are struggling to fully utilise the multiple levels of…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-07-19 Michael Lange , Gerard Gorman , Michele Weiland , Lawrence Mitchell , Xiaohu Guo , James Southern

On the Performance of MPI-OpenMP on a 12 nodes Multi-core Cluster

With the increasing number of Quad-Core-based clusters and the introduction of compute nodes designed with large memory capacity shared by multiple cores, new problems related to scalability arise. In this paper, we analyze the overall…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-08-17 Abdelgadir Tageldin Abdelgadir , Al-Sakib Khan Pathan , Mohiuddin Ahmed

Exploiting Parallelism Opportunities with Deep Learning Frameworks

State-of-the-art machine learning frameworks support a wide variety of design features to enable a flexible machine learning programming interface and to ease the programmability burden on machine learning developers. Identifying and using…

Machine Learning · Computer Science 2020-07-01 Yu Emma Wang , Carole-Jean Wu , Xiaodong Wang , Kim Hazelwood , David Brooks

Bridging the Architecture Gap: Abstracting Performance-Relevant Properties of Modern Server Processors

We describe a universal modeling approach for predicting single- and multicore runtime of steady-state loops on server processors. To this end we strictly differentiate between application and machine models: An application model comprises…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-30 Johannes Hofmann , Christie L. Alappat , Georg Hager , Dietmar Fey , Gerhard Wellein

Towards Autotuning of OpenMP Applications on Multicore Architectures

In this paper we describe an autotuning tool for optimization of OpenMP applications on highly multicore and multithreaded architectures. Our work was motivated by in-depth performance analysis of scientific applications and synthetic…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-01-17 Jakub Katarzyński , Maciej Cytowski

OpenMP Loop Scheduling Revisited: Making a Case for More Schedules

In light of continued advances in loop scheduling, this work revisits the OpenMP loop scheduling by outlining the current state of the art in loop scheduling and presenting evidence that the existing OpenMP schedules are insufficient for…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-09-11 Florina M. Ciorba , Christian Iwainsky , Patrick Buder