Related papers: A Distributed Shared Memory Model and C++ Template…

Parallel Programming Model for the Epiphany Many-Core Coprocessor Using Threaded MPI

The Adapteva Epiphany many-core architecture comprises a 2D tiled mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality. It offers high computational energy efficiency for both integer and floating point…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-06-18 James A. Ross , David A. Richie , Song J. Park , Dale R. Shires

Implementing OpenSHMEM for the Adapteva Epiphany RISC Array Processor

The energy-efficient Adapteva Epiphany architecture exhibits massive many-core scalability in a physically compact 2D array of RISC cores with a fast network-on-chip (NoC). With fully divergent cores capable of MIMD execution, the physical…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-04-15 James A. Ross , David A. Richie

OpenCL + OpenSHMEM Hybrid Programming Model for the Adapteva Epiphany Architecture

There is interest in exploring hybrid OpenSHMEM + X programming models to extend the applicability of the OpenSHMEM interface to more hardware architectures. We present a hybrid OpenCL + OpenSHMEM programming model for device-level…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-01-18 David Richie , James Ross

An OpenSHMEM Implementation for the Adapteva Epiphany Coprocessor

This paper reports the implementation and performance evaluation of the OpenSHMEM 1.3 specification for the Adapteva Epiphany architecture within the Parallella single-board computer. The Epiphany architecture exhibits massive many-core…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-01-18 James Ross , David Richie

Distributed Semi-Speculative Parallel Anisotropic Mesh Adaptation

This paper presents a distributed memory method for anisotropic mesh adaptation that is designed to avoid the use of collective communication and global synchronization techniques. In the presented method, meshing functionality is separated…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-18 Kevin Garner , Polykarpos Thomadakis , Nikos Chrisochoides

Advances in Run-Time Performance and Interoperability for the Adapteva Epiphany Coprocessor

The energy-efficient Adapteva Epiphany architecture exhibits massive many-core scalability in a physically compact 2D array of RISC cores with a fast network-on-chip (NoC). The architecture presents many features and constraints which…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-01-18 David A. Richie , James A. Ross

Software-Distributed Shared Memory for Heterogeneous Machines: Design and Use Considerations

Distributed shared memory (DSM) allows to implement and deploy applications onto distributed architectures using the convenient shared memory programming model in which a set of tasks are able to allocate and access data despite their…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-04 Loïc Cudennec

Programming the Adapteva Epiphany 64-core Network-on-chip Coprocessor

In the construction of exascale computing systems energy efficiency and power consumption are two of the major challenges. Low-power high performance embedded systems are of increasing interest as building blocks for large scale high-…

Hardware Architecture · Computer Science 2014-11-03 Anish Varghese , Bob Edwards , Gaurav Mitra , Alistair P. Rendell

Towards Distributed Semi-speculative Adaptive Anisotropic Parallel Mesh Generation

This paper presents the foundational elements of a distributed memory method for mesh generation that is designed to leverage concurrency offered by large-scale computing. To achieve this goal, meshing functionality is separated from…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-12-22 Kevin Garner , Christos Tsolakis , Polykarpos Thomadakis , Nikos Chrisochoides

Fast Dynamic Memory Integration in Co-Simulation Frameworks for Multiprocessor System on-Chip

In this paper is proposed a technique to integrate and simulate a dynamic memory in a multiprocessor framework based on C/C++/SystemC. Using host machine's memory management capabilities, dynamic data processing is supported without…

Hardware Architecture · Computer Science 2011-11-09 O. Villa , P. Schaumont , I. Verbauwhede , M. Monchiero , G. Palermo

A Partition-insensitive Parallel Framework for Distributed Model Fitting

Distributed model fitting refers to the process of fitting a mathematical or statistical model to the data using distributed computing resources, such that computing tasks are divided among multiple interconnected computers or nodes, often…

Computation · Statistics 2024-06-04 Xiaofei Wu , Rongmei Liang , Fabio Roli , Marcello Pelillo , Jing Yuan

MODC: Resilience for disaggregated memory architectures using task-based programming

Disaggregated memory architectures provide benefits to applications beyond traditional scale out environments, such as independent scaling of compute and memory resources. They also provide an independent failure model, where computations…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-14 Kimberly Keeton , Sharad Singhal , Haris Volos , Yupu Zhang , Ramesh Chandra Chaurasiya , Clarete Riana Crasta , Sherin T George , Nagaraju K N , Mashood Abdulla K , Kavitha Natarajan , Porno Shome , Sanish Suresh

Parallel Delta-Stepping Algorithm for Shared Memory Architectures

We present a shared memory implementation of a parallel algorithm, called delta-stepping, for solving the single source shortest path problem for directed and undirected graphs. In order to reduce synchronization costs we make some…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-02-21 M. Kranjčević , D. Palossi , S. Pintarelli

Regional Consistency: Programmability and Performance for Non-Cache-Coherent Systems

Parallel programmers face the often irreconcilable goals of programmability and performance. HPC systems use distributed memory for scalability, thereby sacrificing the programmability advantages of shared memory programming models.…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-01-21 Bharath Ramesh , Calvin J. Ribbens , Srinidhi Varadarajan

JArena: Partitioned Shared Memory for NUMA-awareness in Multi-threaded Scientific Applications

The distributed shared memory (DSM) architecture is widely used in today's computer design to mitigate the ever-widening processing-memory gap, and inevitably exhibits non-uniform memory access (NUMA) to shared-memory parallel applications.…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-21 Zhang Yang , Aiqing Zhang , Zeyao Mo

The Case for Distributed Shared-Memory Databases with RDMA-Enabled Memory Disaggregation

Memory disaggregation (MD) allows for scalable and elastic data center design by separating compute (CPU) from memory. With MD, compute and memory are no longer coupled into the same server box. Instead, they are connected to each other via…

Databases · Computer Science 2022-07-08 Ruihong Wang , Jianguo Wang , Stratos Idreos , M. Tamer Özsu , Walid G. Aref

The Distributed Network Processor: a novel off-chip and on-chip interconnection network architecture

One of the most demanding challenges for the designers of parallel computing architectures is to deliver an efficient network infrastructure providing low latency, high bandwidth communications while preserving scalability. Besides off-chip…

Hardware Architecture · Computer Science 2012-03-08 Andrea Biagioni , Francesca Lo Cicero , Alessandro Lonardo , Pier Stanislao Paolucci , Mersia Perra , Davide Rossetti , Carlo Sidore , Francesco Simula , Laura Tosoratto , Piero Vicini

Parallel Data Distribution Management on Shared-Memory Multiprocessors

The problem of identifying intersections between two sets of d-dimensional axis-parallel rectangles appears frequently in the context of agent-based simulation studies. For this reason, the High Level Architecture (HLA) specification -- a…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-02-28 Moreno Marzolla , Gabriele D'Angelo

A Complexity Separation Between the Cache-Coherent and Distributed Shared Memory Models

We consider asynchronous multiprocessor systems where processes communicate by accessing shared memory. Exchange of information among processes in such a multiprocessor necessitates costly memory accesses called \emph{remote memory…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-09-26 Wojciech Golab

Evaluating Cache Coherent Shared Virtual Memory for Heterogeneous Multicore Chips

The trend in industry is towards heterogeneous multicore processors (HMCs), including chips with CPUs and massively-threaded throughput-oriented processors (MTTOPs) such as GPUs. Although current homogeneous chips tightly couple the cores…

Hardware Architecture · Computer Science 2013-10-30 Blake A. Hechtman , Daniel J. Sorin