Related papers: Compiler Optimization for Irregular Memory Access …

Hardware Support for Address Mapping in PGAS Languages; a UPC Case Study

The Partitioned Global Address Space (PGAS) programming model strikes a balance between the locality-aware, but explicit, message-passing model and the easy-to-use, but locality-agnostic, shared memory model. However, the PGAS rich memory…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-09-11 Olivier Serres , Abdullah Kayi , Ahmad Anbar , Tarek El-Ghazawi

A Scalable Actor-based Programming System for PGAS Runtimes

The PGAS model is well suited for executing irregular applications on cluster-based systems, due to its efficient support for short, one-sided messages. However, there are currently two major limitations faced by PGAS applications. The…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-22 Sri Raj Paul , Akihiro Hayashi , Kun Chen , Vivek Sarkar

A PGAS Communication Library for Heterogeneous Clusters

This work presents a heterogeneous communication library for clusters of processors and FPGAs. This library, Shoal, supports the Partitioned Global Address Space (PGAS) memory model for applications. PGAS is a shared memory model for…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-04-27 Varun Sharma , Paul Chow

A Theory of Partitioned Global Address Spaces

Partitioned global address space (PGAS) is a parallel programming model for the development of applications on clusters. It provides a global address space partitioned among the cluster nodes, and is supported in programming languages like…

Logic in Computer Science · Computer Science 2013-07-26 Georgel Calin , Egor Derevenetc , Rupak Majumdar , Roland Meyer

A highly scalable particle tracking algorithm using partitioned global address space (PGAS) programming for extreme-scale turbulence simulations

A new parallel algorithm utilizing partitioned global address space (PGAS) programming model to achieve high scalability is reported for particle tracking in direct numerical simulations of turbulent flow. The work is motivated by the…

Computational Physics · Physics 2020-05-28 Dhawal Buaria , P. K. Yeung

On the Performance and Energy Efficiency of the PGAS Programming Model on Multicore Architectures

Using large-scale multicore systems to get the maximum performance and energy efficiency with manageable programmability is a major challenge. The partitioned global address space (PGAS) programming model enhances programmability by…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-01 Jérémie Lagravière , Johannes Langguth , Mohammed Sourouri , Phuong H. Ha , Xing Cai

Re-thinking Memory-Bound Limitations in CGRAs

Coarse-Grained Reconfigurable Arrays (CGRAs) are specialized accelerators commonly employed to boost performance in workloads with iterative structures. Existing research typically focuses on compiler or architecture optimizations aimed at…

Hardware Architecture · Computer Science 2025-08-28 Xiangfeng Liu , Zhe Jiang , Anzhen Zhu , Xiaomeng Han , Mingsong Lyu , Qingxu Deng , Nan Guan

Leveraging MPI-3 Shared-Memory Extensions for Efficient PGAS Runtime Systems

The relaxed semantics and rich functionality of one-sided communication primitives of MPI-3 makes MPI an attractive candidate for the implementation of PGAS models. However, the performance of such implementation suffers from the fact, that…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-03-08 Huan Zhou , Kamran Idrees , José Gracia

Paving the way for Distributed Non-Blocking Algorithms and Data Structures in the Partitioned Global Address Space

The partitioned global address space has bridged the gap between shared and distributed memory, and with this bridge comes the ability to adapt shared memory concepts, such as non-blocking programming, to distributed systems such as…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-07-30 Garvit Dewan , Louis Jenkins

Improving Communication Patterns in Polyhedral Process Networks

Embedded system performances are bounded by power consumption. The trend is to offload greedy computations on hardware accelerators as GPU, Xeon Phi or FPGA. FPGA chips combine both flexibility of programmable chips and energy-efficiency of…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-16 Christophe Alias

SPARK00: A Benchmark Package for the Compiler Evaluation of Irregular/Sparse Codes

We propose a set of benchmarks that specifically targets a major cause of performance degradation in high performance computing platforms: irregular access patterns. These benchmarks are meant to be used to asses the performance of…

Performance · Computer Science 2008-05-27 H. L. A. van der Spek , E. M. Bakker , H. A. G. Wijshoff

Performance optimization and modeling of fine-grained irregular communication in UPC

The UPC programming language offers parallelism via logically partitioned shared memory, which typically spans physically disjoint memory sub-systems. One convenient feature of UPC is its ability to automatically execute between-thread data…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-01 Jérémie Lagravière , Johannes Langguth , Martina Prugger , Lukas Einkemmer , Phuong H. Ha , Xing Cai

DART-MPI: An MPI-based Implementation of a PGAS Runtime System

A Partitioned Global Address Space (PGAS) approach treats a distributed system as if the memory were shared on a global level. Given such a global view on memory, the user may program applications very much like shared memory systems. This…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-07-08 Huan Zhou , Yousri Mhedheb , Kamran Idrees , Colin W. Glass , José Gracia , Karl Fürlinger , Jie Tao

Performance Evaluation of Unified Parallel C for Molecular Dynamics

Partitioned Global Address Space (PGAS) integrates the concepts of shared memory programming and the control of data distribution and locality provided by message passing into a single parallel programming model. The purpose of allying…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-03-15 Kamran Idrees , Christoph Niethammer , Aniello Esposito , Colin W. Glass

Parallel Local Search: Experiments with a PGAS-based programming model

Local search is a successful approach for solving combinatorial optimization and constraint satisfaction problems. With the progressing move toward multi and many-core systems, GPUs and the quest for Exascale systems, parallelism has become…

Programming Languages · Computer Science 2013-05-13 Rui Machado , Salvador Abreu , Daniel Diaz

Scaling Shared-Memory Data Structures as Distributed Global-View Data Structures in the Partitioned Global Address Space model

The Partitioned Global Address Space (PGAS), a memory model in which the global address space is explicitly partitioned across compute nodes in a cluster, strives to bridge the gap between shared-memory and distributed-memory programming.…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-12-02 Garvit Dewan , Louis Jenkins

DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorithms

We present DASH, a C++ template library that offers distributed data structures and parallel algorithms and implements a compiler-free PGAS (partitioned global address space) approach. DASH offers many productivity and performance features…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-06 Karl Fürlinger , Tobias Fuchs , Roger Kowalewski

Online Sketch-based Query Optimization

Cost-based query optimization remains a critical task in relational databases even after decades of research and industrial development. Query optimizers rely on a large range of statistical synopses -- including attribute-level histograms…

Databases · Computer Science 2021-02-05 Yesdaulet Izenov , Asoke Datta , Florin Rusu , Jun Hyung Shin

UpDown: Programmable fine-grained Events for Scalable Performance on Irregular Applications

Applications with irregular data structures, data-dependent control flows and fine-grained data transfers (e.g., real-world graph computations) perform poorly on cache-based systems. We propose the UpDown accelerator that supports…

Hardware Architecture · Computer Science 2024-07-31 Andronicus Rajasukumar , Jiya Su , Yuqing , Wang , Tianshuo Su , Marziyeh Nourian , Jose M Monsalve Diaz , Tianchi Zhang , Jianru Ding , Wenyi Wang , Ziyi Zhang , Moubarak Jeje , Henry Hoffmann , Yanjing Li , Andrew A. Chien

Redesigning OP2 Compiler to Use HPX Runtime Asynchronous Techniques

Maximizing parallelism level in applications can be achieved by minimizing overheads due to load imbalances and waiting time due to memory latencies. Compiler optimization is one of the most effective solutions to tackle this problem. The…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-03-29 Zahra Khatami , Hartmut Kaiser , J. Ramanujam