Related papers: Parallel Programming Model for the Epiphany Many-C…

Implementing OpenSHMEM for the Adapteva Epiphany RISC Array Processor

The energy-efficient Adapteva Epiphany architecture exhibits massive many-core scalability in a physically compact 2D array of RISC cores with a fast network-on-chip (NoC). With fully divergent cores capable of MIMD execution, the physical…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-04-15 James A. Ross , David A. Richie

A Distributed Shared Memory Model and C++ Templated Meta-Programming Interface for the Epiphany RISC Array Processor

The Adapteva Epiphany many-core architecture comprises a scalable 2D mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality. Whereas such a processor offers high computational energy efficiency and parallel…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-04-28 David Richie , James Ross , Jamie Infantolino

Advances in Run-Time Performance and Interoperability for the Adapteva Epiphany Coprocessor

The energy-efficient Adapteva Epiphany architecture exhibits massive many-core scalability in a physically compact 2D array of RISC cores with a fast network-on-chip (NoC). The architecture presents many features and constraints which…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-01-18 David A. Richie , James A. Ross

OpenCL + OpenSHMEM Hybrid Programming Model for the Adapteva Epiphany Architecture

There is interest in exploring hybrid OpenSHMEM + X programming models to extend the applicability of the OpenSHMEM interface to more hardware architectures. We present a hybrid OpenCL + OpenSHMEM programming model for device-level…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-01-18 David Richie , James Ross

Programming the Adapteva Epiphany 64-core Network-on-chip Coprocessor

In the construction of exascale computing systems energy efficiency and power consumption are two of the major challenges. Low-power high performance embedded systems are of increasing interest as building blocks for large scale high-…

Hardware Architecture · Computer Science 2014-11-03 Anish Varghese , Bob Edwards , Gaurav Mitra , Alistair P. Rendell

An OpenSHMEM Implementation for the Adapteva Epiphany Coprocessor

This paper reports the implementation and performance evaluation of the OpenSHMEM 1.3 specification for the Adapteva Epiphany architecture within the Parallella single-board computer. The Epiphany architecture exhibits massive many-core…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-01-18 James Ross , David Richie

Performance Evaluation of Parallel Message Passing and Thread Programming Model on Multicore Architectures

The current trend of multicore architectures on shared memory systems underscores the need of parallelism. While there are some programming model to express parallelism, thread programming model has become a standard to support these system…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-12-13 D. T. Hasta , A. B. Mutiara

ePython: An implementation of Python for the many-core Epiphany coprocessor

The Epiphany is a many-core, low power, low on-chip memory architecture and one can very cheaply gain access to a number of parallel cores which is beneficial for HPC education and prototyping. The very low power nature of these…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-29 Nick Brown

Kickstarting High-performance Energy-efficient Manycore Architectures with Epiphany

In this paper we introduce Epiphany as a high-performance energy-efficient manycore architecture suitable for real-time embedded systems. This scalable architecture supports floating point operations in hardware and achieves 50 GFLOPS/W in…

Hardware Architecture · Computer Science 2014-12-18 Andreas Olofsson , Tomas Nordström , Zain Ul-Abdin

Improving Latency in a Signal Processing System on the Epiphany Architecture

In this paper we use the Adapteva Epiphany manycore chip to demonstrate how the throughput and the latency of a baseband signal processing chain, typically found in LTE or WiFi, can be optimized by a combination of task- and data…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-12-18 Peter Brauer , Martin Lundqvist , Aare Mällo

Frustrated with MPI+Threads? Try MPIxThreads!

MPI+Threads, embodied by the MPI/OpenMP hybrid programming model, is a parallel programming paradigm where threads are used for on-node shared-memory parallelization and MPI is used for multi-node distributed-memory parallelization. OpenMP…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-31 Hui Zhou , Ken Raffenetti , Junchao Zhang , Yanfei Guo , Rajeev Thakur

Parallel Performance of MPI Sorting Algorithms on Dual-Core Processor Windows-Based Systems

Message Passing Interface (MPI) is widely used to implement parallel programs. Although Windowsbased architectures provide the facilities of parallel execution and multi-threading, little attention has been focused on using MPI on these…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-05-31 Alaa Ismail Elnashar

Parallel Programming with MatlabMPI

MatlabMPI is a Matlab implementation of the Message Passing Interface (MPI) standard and allows any Matlab program to exploit multiple processors. MatlabMPI currently implements the basic six functions that are the core of the MPI…

Astrophysics · Physics 2007-05-23 Jeremy Kepner

Asynchronous MPI for the Masses

We present a simple library which equips MPI implementations with truly asynchronous non-blocking point-to-point operations, and which is independent of the underlying communication infrastructure. It utilizes the MPI profiling interface…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-02-19 Markus Wittmann , Georg Hager , Thomas Zeiser , Gerhard Wellein

MPI Collectives for Multi-core Clusters: Optimized Performance of the Hybrid MPI+MPI Parallel Codes

The advent of multi-/many-core processors in clusters advocates hybrid parallel programming, which combines Message Passing Interface (MPI) for inter-node parallelism with a shared memory model for on-node parallelism. Compared to the…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-07-15 Huan Zhou , Jose Gracia , Ralf Schneider

Benchmarking mixed-mode PETSc performance on high-performance architectures

The trend towards highly parallel multi-processing is ubiquitous in all modern computer architectures, ranging from handheld devices to large-scale HPC systems; yet many applications are struggling to fully utilise the multiple levels of…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-07-19 Michael Lange , Gerard Gorman , Michele Weiland , Lawrence Mitchell , Xiaohu Guo , James Southern

MPIgnite: An MPI-Like Language and Prototype Implementation for Apache Spark

Scale-out parallel processing based on MPI is a 25-year-old standard with at least another decade of preceding history of enabling technologies in the High Performance Computing community. Newer frameworks such as MapReduce, Hadoop, and…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-07-18 Brandon L. Morris , Anthony Skjellum

PartRePer-MPI: Combining Fault Tolerance and Performance for MPI Applications

As we have entered Exascale computing, the faults in high-performance systems are expected to increase considerably. To compensate for a higher failure rate, the standard checkpoint/restart technique would need to create checkpoints at a…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-10-26 Sarthak Joshi , Sathish Vadhiyar

A thread-parallel algorithm for anisotropic mesh adaptation

Anisotropic mesh adaptation is a powerful way to directly minimise the computational cost of mesh based simulation. It is particularly important for multi-scale problems where the required number of floating-point operations can be reduced…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-08-13 Georgios Rokos , Gerard J. Gorman , James Southern , Paul H. J. Kelly

Mixed-mode implementation of PETSc for scalable linear algebra on multi-core processors

With multi-core processors a ubiquitous building block of modern supercomputers, it is now past time to enable applications to embrace these developments in processor design. To achieve exascale performance, applications will need ways of…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-08-13 Michele Weiland , Lawrence Mitchell , Gerard Gorman , Stephan Kramer , Mark Parsons , James Southern