Related papers: Applying Type Oriented Programming to the PGAS Mem…
Data intensive workloads have become a popular use of HPC in recent years and the question of how data scientists, who might not be HPC experts, can effectively program these machines is important to address. Whilst using models such as…
Partitioned global address space (PGAS) is a parallel programming model for the development of applications on clusters. It provides a global address space partitioned among the cluster nodes, and is supported in programming languages like…
Whilst there have been great advances in HPC hardware and software in recent years, the languages and models that we use to program these machines have remained much more static. This is not from a lack of effort, but instead by virtue of…
A type system is introduced for a generic Object Oriented programming language in order to infer resource upper bounds. A sound andcomplete characterization of the set of polynomial time computable functions is obtained. As a consequence,…
We describe a methodology for designing efficient parallel and distributed scientific software. This methodology utilizes sequences of mechanizable algebra--based optimizing transformations. In this study, we apply our methodology to the…
By providing highly efficient one-sided communication with globally shared memory space, Partitioned Global Address Space (PGAS) has become one of the most promising parallel computing models in high-performance computing (HPC). Meanwhile,…
The Partitioned Global Address Space (PGAS) programming model strikes a balance between the locality-aware, but explicit, message-passing model and the easy-to-use, but locality-agnostic, shared memory model. However, the PGAS rich memory…
Dependently typed programming languages such as Coq, Agda, Idris, and F*, allow programmers to write detailed specifications of their programs and prove their programs meet these specifications. However, these specifications can be violated…
Writing parallel codes is difficult and exhibits a fundamental trade-off between abstraction and performance. The high level language abstractions designed to simplify the complexities of parallelism make certain assumptions that impacts…
The framework Pure Type System (PTS) offers a simple and general approach to designing and formalizing type systems. However, in the presence of dependent types, there often exist certain acute problems that make it difficult for PTS to…
A novel approach is presented to teach the parallel and distributed computing concepts of synchronization and remote memory access. The single program multiple data (SPMD) partitioned global address space (PGAS) model presented in this…
Temporal prefetching shows promise for handling irregular memory access patterns, which are common in data-dependent and pointer-based data structures. Recent studies introduced on-chip metadata storage to reduce the memory traffic caused…
Local search is a successful approach for solving combinatorial optimization and constraint satisfaction problems. With the progressing move toward multi and many-core systems, GPUs and the quest for Exascale systems, parallelism has become…
pPython seeks to provide a parallel capability that provides good speed-up without sacrificing the ease of programming in Python by implementing partitioned global array semantics (PGAS) on top of a simple file-based messaging library…
Recent parameter-efficient finetuning (PEFT) techniques aim to improve over the considerable cost of fully finetuning large pretrained language models (PLM). As different PEFT techniques proliferate, it is becoming difficult to compare…
In recent years the computing landscape has seen an in- creasing shift towards specialized accelerators. Field pro- grammable gate arrays (FPGAs) are particularly promising as they offer significant performance and energy improvements…
Programs with high levels of complexity often face challenges in adjusting execution parameters, particularly when these parameters vary based on the execution context. These dynamic parameters significantly impact the program's…
Scalable and efficient numerical simulations continue to gain importance, as computation is firmly established as the third pillar of discovery, alongside theory and experiment. Meanwhile, the performance of computing hardware grows through…
Emerging hybrid accelerator architectures for high performance computing are often suited for the use of a data-parallel programming model. Unfortunately, programmers of these architectures face a steep learning curve that frequently…
Pre-trained language models (PLMs) show impressive performance in various downstream NLP tasks. However, pre-training large language models demands substantial memory and training compute. Furthermore, due to the substantial resources…