Related papers: Data Access - Experiences Implementing an Object O…
The set of externally visible properties associated with process variables in the Experimental Physics and Industrial Control System (EPICS) is predefined in the EPICS base distribution and is therefore not extensible by plug-compatible…
Choosing the best memory layout for each hardware architecture is increasingly important as more and more programs become memory bound. For portable codes that run across heterogeneous hardware architectures, the choice of the memory layout…
Operating Systems are built upon a set of abstractions to provide resource management and programming APIs for common functionality, such as synchronization, communication, protection, and I/O. The process abstraction is the bridge across…
The performance gap between CPU and memory widens continuously. Choosing the best memory layout for each hardware architecture is increasingly important as more and more programs become memory bound. For portable codes that run across…
Programmability, performance portability, and resource efficiency have emerged as critical challenges in harnessing complex and diverse architectures today to obtain high performance and energy efficiency. While there is abundant research,…
On the way to Exascale, programmers face the increasing challenge of having to support multiple hardware architectures from the same code base. At the same time, portability of code and performance are increasingly difficult to achieve as…
This paper examines disaggregated data center architectures from the perspective of the applications that would run on these data centers, and challenges the abstractions that have been proposed to date. In particular, we argue that…
Approximate memory is a technique to mitigate the performance gap between memory subsystems and CPUs with its reduced access latency at a cost of data integrity. To gain benefit from approximate memory for realistic applications, it is…
Applied research in graph algorithms and combinatorial structures needs comprehensive and versatile software libraries. However, the design and the implementation of flexible libraries are challenging activities. Among the other problems…
We define an abstract framework for object-oriented programming and show that object-oriented languages, such as C++, can be interpreted as parallel programming languages. Parallel C++ code is typically more than ten times shorter than the…
High-performance object stores are an emerging technology which offers an alternative solution in the field of HPC storage, with potential to address long-standing scalability issues in traditional distributed POSIX file systems due to…
Compute eXpress Link (CXL) is emerging as a promising memory interface technology. However, its performance characteristics remain largely unclear due to the limited availability of production hardware. Key questions include: What are the…
A newcomer in the Partitioned Global Address Space (PGAS) 'world' has arrived in its version 1.0: Unified Parallel C++ (UPC++). UPC++ targets distributed data structures where communication is irregular or fine-grained. The key abstractions…
Training advanced AI models requires large investments in computational resources, or compute. Yet, as hardware innovation reduces the price of compute and algorithmic advances make its use more efficient, the cost of training an AI model…
As applications grow in capability, they also grow in complexity. This complexity in turn gets pushed into modules and libraries. In addition, hardware configurations become increasingly elaborate, too. These two trends make understanding,…
Memory-centric computing aims to enable computation capability in and near all places where data is generated and stored. As such, it can greatly reduce the large negative performance and energy impact of data access and data movement, by…
Modern science and engineering computing environments often feature storage systems of different types, from parallel file systems in high-performance computing centers to object stores operated by cloud providers. To enable easy, reliable,…
Memory latency, bandwidth, capacity, and energy increasingly limit performance. In this paper, we reconsider proposed system architectures that consist of huge (many-terabyte to petabyte scale) memories shared among large numbers of CPUs.…
Object-oriented programming is often regarded as too inefficient for high-performance computing (HPC), despite the fact that many important HPC problems have an inherent object structure. Our goal is to bring efficient, object-oriented…
Processing-in-memory (PIM) reduces data movement by executing near memory, but our large-scale characterization on real PIM hardware shows that end-to-end performance is often limited by disjoint host and device address spaces that force…