Related papers: A Fast Causal Profiler for Task Parallel Programs

Profiling parallel Mercury programs with ThreadScope

The behavior of parallel programs is even harder to understand than the behavior of sequential programs. Parallel programs may suffer from any of the performance problems affecting sequential programs, as well as from several problems…

Programming Languages · Computer Science 2011-09-08 Paul Bone , Zoltan Somogyi

GAPP: A Fast Profiler for Detecting Serialization Bottlenecks in Parallel Linux Applications

We present a parallel profiling tool, GAPP, that identifies serialization bottlenecks in parallel Linux applications arising from load imbalance or contention for shared resources . It works by tracing kernel context switch events using…

Performance · Computer Science 2020-04-14 Reena Nair , Tony Field

Estimating the overlap between dependent computations for automatic parallelization

Researchers working on the automatic parallelization of programs have long known that too much parallelism can be even worse for performance than too little, because spawning a task to be run on another CPU incurs overheads.…

Programming Languages · Computer Science 2011-09-08 Paul Bone , Zoltan Somogyi , Peter Schachte

Automatic Detection of Performance Anomalies in Task-Parallel Programs

To efficiently exploit the resources of new many-core architectures, integrating dozens or even hundreds of cores per chip, parallel programming models have evolved to expose massive amounts of parallelism, often in the form of fine-grained…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-05-14 Andi Drebes , Karine Heydemann , Antoniu Pop , Albert Cohen , Nathalie Drach

Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance

We present Task Bench, a parameterized benchmark designed to explore the performance of parallel and distributed programming systems under a variety of application scenarios. Task Bench lowers the barrier to benchmarking multiple…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-19 Elliott Slaughter , Wei Wu , Yuankun Fu , Legend Brandenburg , Nicolai Garcia , Wilhem Kautz , Emily Marx , Kaleb S. Morris , Wonchan Lee , Qinglei Cao , George Bosilca , Seema Mirchandaney , Sean Treichler , Patrick McCormick , Alex Aiken

Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime

There are billions of lines of sequential code inside nowadays' software which do not benefit from the parallelism available in modern multicore architectures. Automatically parallelizing sequential code, to promote an efficient use of the…

Programming Languages · Computer Science 2016-04-13 Alcides Fonseca , Bruno Cabral , João Rafael , Ivo Correia

Worksharing Tasks: An Efficient Way to Exploit Irregular and Fine-Grained Loop Parallelism

Shared memory programming models usually provide worksharing and task constructs. The former relies on the efficient fork-join execution model to exploit structured parallelism; while the latter relies on fine-grained synchronization among…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-08 M. Maronas , K. Sala , S. Mateo , E. Ayguadé , V. Beltran Barcelona Supercomputing Center

DepGraph: Localizing Performance Bottlenecks in Multi-Core Applications Using Waiting Dependency Graphs and Software Tracing

This paper addresses the challenge of understanding the waiting dependencies between the threads and hardware resources required to complete a task. The objective is to improve software performance by detecting the underlying bottlenecks…

Software Engineering · Computer Science 2021-03-09 Naser Ezzati-Jivan , Quentin Fournier , Michel R. Dagenais , Abdelwahab Hamou-Lhadj

TaPS: A Performance Evaluation Suite for Task-based Execution Frameworks

Task-based execution frameworks, such as parallel programming libraries, computational workflow systems, and function-as-a-service platforms, enable the composition of distinct tasks into a single, unified application designed to achieve a…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-08-15 J. Gregory Pauloski , Valerie Hayot-Sasson , Maxime Gonthier , Nathaniel Hudson , Haochen Pan , Sicheng Zhou , Ian Foster , Kyle Chard

Taskflow: A Lightweight Parallel and Heterogeneous Task Graph Computing System

Taskflow aims to streamline the building of parallel and heterogeneous applications using a lightweight task graph-based approach. Taskflow introduces an expressive task graph programming model to assist developers in the implementation of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-08 Tsung-Wei Huang , Dian-Lun Lin , Chun-Xun Lin , Yibo Lin

Profile-Guided Parallel Task Extraction and Execution for Domain Specific Heterogeneous SoC

In this study, we introduce a methodology for automatically transforming user applications in the radar and communication domain written in C/C++ based on dynamic profiling to a parallel representation targeted for a heterogeneous SoC. We…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-11-29 Liangliang Chang , Joshua Mack , Benjamin Willis , Xing Chen , John Brunhaver , Ali Akoglu , Chaitali Chakrabarti

PROMPT: A Fast and Extensible Memory Profiling Framework

Memory profiling captures programs' dynamic memory behavior, assisting programmers in debugging, tuning, and enabling advanced compiler optimizations like speculation-based automatic parallelization. As each use case demands its unique…

Performance · Computer Science 2023-11-07 Ziyang Xu , Yebin Chon , Yian Su , Zujun Tan , Sotiris Apostolakis , Simone Campanoni , David I. August

TaskUniVerse: A Task-Based Unified Interface for Versatile Parallel Execution

Task based parallel programming has shown competitive outcomes in many aspects of parallel programming such as efficiency, performance, productivity and scalability. Different approaches are used by different software development frameworks…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-05-09 Afshin Zafari

Traveler: Navigating Task Parallel Traces for Performance Analysis

Understanding the behavior of software in execution is a key step in identifying and fixing performance issues. This is especially important in high performance computing contexts where even minor performance tweaks can translate into large…

Human-Computer Interaction · Computer Science 2022-09-07 Sayef Azad Sakin , Alex Bigelow , R. Tohid , Connor Scully-Allison , Carlos Scheidegger , Steven R. Brandt , Christopher Taylor , Kevin A. Huck , Hartmut Kaiser , Katherine E. Isaacs

Parallelizing Deadlock Resolution in Symbolic Synthesis of Distributed Programs

Previous work has shown that there are two major complexity barriers in the synthesis of fault-tolerant distributed programs: (1) generation of fault-span, the set of states reachable in the presence of faults, and (2) resolving deadlock…

Distributed, Parallel, and Cluster Computing · Computer Science 2009-12-15 Fuad Abujarad , Borzoo Bonakdarpour , Sandeep S. Kulkarni

A fast PC algorithm for high dimensional causal discovery with multi-core PCs

Discovering causal relationships from observational data is a crucial problem and it has applications in many research areas. The PC algorithm is the state-of-the-art constraint based method for causal discovery. However, runtime of the PC…

Artificial Intelligence · Computer Science 2016-11-11 Thuc Duy Le , Tao Hoang , Jiuyong Li , Lin Liu , Huawen Liu

Proactive bottleneck performance analysis in parallel computing using openMP

The aim of parallel computing is to increase an application performance by executing the application on multiple processors. OpenMP is an API that supports multi platform shared memory programming model and shared-memory programs are…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-11-12 Vibha Rajput , Alok Katiyar

NumaPerf: Predictive and Full NUMA Profiling

Parallel applications are extremely challenging to achieve the optimal performance on the NUMA architecture, which necessitates the assistance of profiling tools. However, existing NUMA-profiling tools share some similar shortcomings, such…

Performance · Computer Science 2021-02-11 Xin Zhao , Jin Zhou , Hui Guan , Wei Wang , Xu Liu , Tongping Liu

Exploiting Parallelism Opportunities with Deep Learning Frameworks

State-of-the-art machine learning frameworks support a wide variety of design features to enable a flexible machine learning programming interface and to ease the programmability burden on machine learning developers. Identifying and using…

Machine Learning · Computer Science 2020-07-01 Yu Emma Wang , Carole-Jean Wu , Xiaodong Wang , Kim Hazelwood , David Brooks

A Distributed Framework for Causal Modeling of Performance Variability in GPU Traces

Large-scale GPU traces play a critical role in identifying performance bottlenecks within heterogeneous High-Performance Computing (HPC) architectures. However, the sheer volume and complexity of a single trace of data make performance…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-22 Ankur Lahiry , Ayush Pokharel , Banooqa Banday , Seth Ockerman , Amal Gueroudji , Mohammad Zaeed , Tanzima Z. Islam , Line Pouchard