English
Related papers

Related papers: A Fast Causal Profiler for Task Parallel Programs

200 papers

The behavior of parallel programs is even harder to understand than the behavior of sequential programs. Parallel programs may suffer from any of the performance problems affecting sequential programs, as well as from several problems…

Programming Languages · Computer Science 2011-09-08 Paul Bone , Zoltan Somogyi

We present a parallel profiling tool, GAPP, that identifies serialization bottlenecks in parallel Linux applications arising from load imbalance or contention for shared resources . It works by tracing kernel context switch events using…

Performance · Computer Science 2020-04-14 Reena Nair , Tony Field

Researchers working on the automatic parallelization of programs have long known that too much parallelism can be even worse for performance than too little, because spawning a task to be run on another CPU incurs overheads.…

Programming Languages · Computer Science 2011-09-08 Paul Bone , Zoltan Somogyi , Peter Schachte

To efficiently exploit the resources of new many-core architectures, integrating dozens or even hundreds of cores per chip, parallel programming models have evolved to expose massive amounts of parallelism, often in the form of fine-grained…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-05-14 Andi Drebes , Karine Heydemann , Antoniu Pop , Albert Cohen , Nathalie Drach

We present Task Bench, a parameterized benchmark designed to explore the performance of parallel and distributed programming systems under a variety of application scenarios. Task Bench lowers the barrier to benchmarking multiple…

There are billions of lines of sequential code inside nowadays' software which do not benefit from the parallelism available in modern multicore architectures. Automatically parallelizing sequential code, to promote an efficient use of the…

Programming Languages · Computer Science 2016-04-13 Alcides Fonseca , Bruno Cabral , João Rafael , Ivo Correia

Shared memory programming models usually provide worksharing and task constructs. The former relies on the efficient fork-join execution model to exploit structured parallelism; while the latter relies on fine-grained synchronization among…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-08 M. Maronas , K. Sala , S. Mateo , E. Ayguadé , V. Beltran Barcelona Supercomputing Center

This paper addresses the challenge of understanding the waiting dependencies between the threads and hardware resources required to complete a task. The objective is to improve software performance by detecting the underlying bottlenecks…

Software Engineering · Computer Science 2021-03-09 Naser Ezzati-Jivan , Quentin Fournier , Michel R. Dagenais , Abdelwahab Hamou-Lhadj

Task-based execution frameworks, such as parallel programming libraries, computational workflow systems, and function-as-a-service platforms, enable the composition of distinct tasks into a single, unified application designed to achieve a…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-08-15 J. Gregory Pauloski , Valerie Hayot-Sasson , Maxime Gonthier , Nathaniel Hudson , Haochen Pan , Sicheng Zhou , Ian Foster , Kyle Chard

Taskflow aims to streamline the building of parallel and heterogeneous applications using a lightweight task graph-based approach. Taskflow introduces an expressive task graph programming model to assist developers in the implementation of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-08 Tsung-Wei Huang , Dian-Lun Lin , Chun-Xun Lin , Yibo Lin

In this study, we introduce a methodology for automatically transforming user applications in the radar and communication domain written in C/C++ based on dynamic profiling to a parallel representation targeted for a heterogeneous SoC. We…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-11-29 Liangliang Chang , Joshua Mack , Benjamin Willis , Xing Chen , John Brunhaver , Ali Akoglu , Chaitali Chakrabarti

Memory profiling captures programs' dynamic memory behavior, assisting programmers in debugging, tuning, and enabling advanced compiler optimizations like speculation-based automatic parallelization. As each use case demands its unique…

Performance · Computer Science 2023-11-07 Ziyang Xu , Yebin Chon , Yian Su , Zujun Tan , Sotiris Apostolakis , Simone Campanoni , David I. August

Task based parallel programming has shown competitive outcomes in many aspects of parallel programming such as efficiency, performance, productivity and scalability. Different approaches are used by different software development frameworks…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-05-09 Afshin Zafari

Understanding the behavior of software in execution is a key step in identifying and fixing performance issues. This is especially important in high performance computing contexts where even minor performance tweaks can translate into large…

Previous work has shown that there are two major complexity barriers in the synthesis of fault-tolerant distributed programs: (1) generation of fault-span, the set of states reachable in the presence of faults, and (2) resolving deadlock…

Distributed, Parallel, and Cluster Computing · Computer Science 2009-12-15 Fuad Abujarad , Borzoo Bonakdarpour , Sandeep S. Kulkarni

Discovering causal relationships from observational data is a crucial problem and it has applications in many research areas. The PC algorithm is the state-of-the-art constraint based method for causal discovery. However, runtime of the PC…

Artificial Intelligence · Computer Science 2016-11-11 Thuc Duy Le , Tao Hoang , Jiuyong Li , Lin Liu , Huawen Liu

The aim of parallel computing is to increase an application performance by executing the application on multiple processors. OpenMP is an API that supports multi platform shared memory programming model and shared-memory programs are…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-11-12 Vibha Rajput , Alok Katiyar

Parallel applications are extremely challenging to achieve the optimal performance on the NUMA architecture, which necessitates the assistance of profiling tools. However, existing NUMA-profiling tools share some similar shortcomings, such…

Performance · Computer Science 2021-02-11 Xin Zhao , Jin Zhou , Hui Guan , Wei Wang , Xu Liu , Tongping Liu

State-of-the-art machine learning frameworks support a wide variety of design features to enable a flexible machine learning programming interface and to ease the programmability burden on machine learning developers. Identifying and using…

Machine Learning · Computer Science 2020-07-01 Yu Emma Wang , Carole-Jean Wu , Xiaodong Wang , Kim Hazelwood , David Brooks

Large-scale GPU traces play a critical role in identifying performance bottlenecks within heterogeneous High-Performance Computing (HPC) architectures. However, the sheer volume and complexity of a single trace of data make performance…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-22 Ankur Lahiry , Ayush Pokharel , Banooqa Banday , Seth Ockerman , Amal Gueroudji , Mohammad Zaeed , Tanzima Z. Islam , Line Pouchard
‹ Prev 1 2 3 10 Next ›