Related papers: A Type-Oriented Graph500 Benchmark
The Partitioned Global Address Space memory model has been popularised by a number of languages and applications. However this abstraction can often result in the programmer having to rely on some in built choices and with this implicit…
Applications in High-Performance Computing (HPC) environments face challenges due to increasing complexity. Among them, the increasing usage of sparse data pushes the limits of data structures and programming models and hampers the…
Using large-scale multicore systems to get the maximum performance and energy efficiency with manageable programmability is a major challenge. The partitioned global address space (PGAS) programming model enhances programmability by…
Whilst there have been great advances in HPC hardware and software in recent years, the languages and models that we use to program these machines have remained much more static. This is not from a lack of effort, but instead by virtue of…
Partitioned global address space (PGAS) is a parallel programming model for the development of applications on clusters. It provides a global address space partitioned among the cluster nodes, and is supported in programming languages like…
Open-source matters, not just to the current cohort of HPC users but also to potential new HPC communities, such as machine learning, themselves often rooted in open-source. Many of these potential new workloads are, by their very nature,…
GPU-based HPC clusters are attracting more scientific application developers due to their extensive parallelism and energy efficiency. In order to achieve portability among a variety of multi/many core architectures, a popular choice for an…
The Partitioned Global Address Space (PGAS) programming model strikes a balance between the locality-aware, but explicit, message-passing model and the easy-to-use, but locality-agnostic, shared memory model. However, the PGAS rich memory…
The HPEC Graph Challenge is a collection of benchmarks representing complex workloads that test the hardware and software components of HPC systems, which traditional benchmarks, such as LINPACK, do not. The first benchmark, Subgraph…
In this paper we evaluate the performance of Unified Parallel C (which implements the partitioned global address space programming model) using a numerical method that is widely used in fluid dynamics. In order to evaluate the incremental…
Modeling data sharing in GPU programs is a challenging task because of the massive parallelism and complex data sharing patterns provided by GPU architectures. Better GPU caching efficiency can be achieved through careful task scheduling…
Local search is a successful approach for solving combinatorial optimization and constraint satisfaction problems. With the progressing move toward multi and many-core systems, GPUs and the quest for Exascale systems, parallelism has become…
In order to improve system performance efficiently, a number of systems choose to equip multi-core and many-core processors (such as GPUs). Due to their discrete memory these heterogeneous architectures comprise a distributed system within…
A type system is introduced for a generic Object Oriented programming language in order to infer resource upper bounds. A sound andcomplete characterization of the set of polynomial time computable functions is obtained. As a consequence,…
Recent advances in reprogrammable hardware (e.g., FPGAs) and memory technology (e.g., DDR4, HBM) promise to solve performance problems inherent to graph processing like irregular memory access patterns on traditional hardware (e.g., CPU).…
Parallel programming models can encourage performance portability by moving the responsibility for work assignment and data distribution from the programmer to a runtime system. However, analyzing the resulting implicit memory allocations,…
A novel approach is presented to teach the parallel and distributed computing concepts of synchronization and remote memory access. The single program multiple data (SPMD) partitioned global address space (PGAS) model presented in this…
Processing massive application graphs on distributed memory systems requires to map the graphs onto the system's processing elements (PEs). This task becomes all the more important when PEs have non-uniform communication costs or the input…
Recent trends in business and technology (e.g., machine learning, social network analysis) benefit from storing and processing growing amounts of graph-structured data in databases and data science platforms. FPGAs as accelerators for graph…
High Performance Computing (HPC) platforms allow scientists to model computationally intensive algorithms. HPC clusters increasingly use General-Purpose Graphics Processing Units (GPGPUs) as accelerators; FPGAs provide an attractive…