Related papers: ePython: An implementation of Python for the many-…

Programming the Adapteva Epiphany 64-core Network-on-chip Coprocessor

In the construction of exascale computing systems energy efficiency and power consumption are two of the major challenges. Low-power high performance embedded systems are of increasing interest as building blocks for large scale high-…

Hardware Architecture · Computer Science 2014-11-03 Anish Varghese , Bob Edwards , Gaurav Mitra , Alistair P. Rendell

Kickstarting High-performance Energy-efficient Manycore Architectures with Epiphany

In this paper we introduce Epiphany as a high-performance energy-efficient manycore architecture suitable for real-time embedded systems. This scalable architecture supports floating point operations in hardware and achieves 50 GFLOPS/W in…

Hardware Architecture · Computer Science 2014-12-18 Andreas Olofsson , Tomas Nordström , Zain Ul-Abdin

Parallel Programming Model for the Epiphany Many-Core Coprocessor Using Threaded MPI

The Adapteva Epiphany many-core architecture comprises a 2D tiled mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality. It offers high computational energy efficiency for both integer and floating point…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-06-18 James A. Ross , David A. Richie , Song J. Park , Dale R. Shires

High level programming abstractions for leveraging hierarchical memories with micro-core architectures

Micro-core architectures combine many low memory, low power computing cores together in a single package. These are attractive for use as accelerators but due to limited on-chip memory and multiple levels of memory hierarchy, the way in…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-06 Maurice Jamieson , Nick Brown

OpenCL + OpenSHMEM Hybrid Programming Model for the Adapteva Epiphany Architecture

There is interest in exploring hybrid OpenSHMEM + X programming models to extend the applicability of the OpenSHMEM interface to more hardware architectures. We present a hybrid OpenCL + OpenSHMEM programming model for device-level…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-01-18 David Richie , James Ross

Improving Latency in a Signal Processing System on the Epiphany Architecture

In this paper we use the Adapteva Epiphany manycore chip to demonstrate how the throughput and the latency of a baseband signal processing chain, typically found in LTE or WiFi, can be optimized by a combination of task- and data…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-12-18 Peter Brauer , Martin Lundqvist , Aare Mällo

Advances in Run-Time Performance and Interoperability for the Adapteva Epiphany Coprocessor

The energy-efficient Adapteva Epiphany architecture exhibits massive many-core scalability in a physically compact 2D array of RISC cores with a fast network-on-chip (NoC). The architecture presents many features and constraints which…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-01-18 David A. Richie , James A. Ross

Asynchronous Execution of Python Code on Task Based Runtime Systems

Despite advancements in the areas of parallel and distributed computing, the complexity of programming on High Performance Computing (HPC) resources has deterred many domain experts, especially in the areas of machine learning and…

Programming Languages · Computer Science 2019-03-08 R. Tohid , Bibek Wagle , Shahrzad Shirzad , Patrick Diehl , Adrian Serio , Alireza Kheirkhahan , Parsa Amini , Katy Williams , Kate Isaacs , Kevin Huck , Steven Brandt , Hartmut Kaiser

pPython for Parallel Python Programming

pPython seeks to provide a parallel capability that provides good speed-up without sacrificing the ease of programming in Python by implementing partitioned global array semantics (PGAS) on top of a simple file-based messaging library…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-12-12 Chansup Byun , William Arcand , David Bestor , Bill Bergeron , Vijay Gadepally , Michael Houle , Matthew Hubbell , Hayden Jananthan , Michael Jones , Kurt Keville , Anna Klein , Peter Michaleas , Lauren Milechin , Guillermo Morales , Julie Mullen , Andrew Prout , Albert Reuther , Antonio Rosa , Siddharth Samsi , Charles Yee , Jeremy Kepner

An OpenSHMEM Implementation for the Adapteva Epiphany Coprocessor

This paper reports the implementation and performance evaluation of the OpenSHMEM 1.3 specification for the Adapteva Epiphany architecture within the Parallella single-board computer. The Epiphany architecture exhibits massive many-core…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-01-18 James Ross , David Richie

Python Workflows on HPC Systems

The recent successes and wide spread application of compute intensive machine learning and data analytics methods have been boosting the usage of the Python programming language on HPC systems. While Python provides many advantages for the…

Machine Learning · Computer Science 2020-12-02 Dominik Strassel , Philipp Reusch , Janis Keuper

CPUBone: Efficient Vision Backbone Design for Devices with Low Parallelization Capabilities

Recent research on vision backbone architectures has predominantly focused on optimizing efficiency for hardware platforms with high parallel processing capabilities. This category increasingly includes embedded systems such as mobile…

Computer Vision and Pattern Recognition · Computer Science 2026-03-31 Moritz Nottebaum , Matteo Dunnhofer , Christian Micheloni

Emulating a large memory with a collection of small ones

Sequential computation is well understood but does not scale well with current technology. Within the next decade, systems will contain large numbers of processors with potentially thousands of processors per chip. Despite this, many…

Hardware Architecture · Computer Science 2015-11-17 James Hanlon

Implementing OpenSHMEM for the Adapteva Epiphany RISC Array Processor

The energy-efficient Adapteva Epiphany architecture exhibits massive many-core scalability in a physically compact 2D array of RISC cores with a fast network-on-chip (NoC). With fully divergent cores capable of MIMD execution, the physical…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-04-15 James A. Ross , David A. Richie

Generation of the Single Precision BLAS library for the Parallella platform, with Epiphany co-processor acceleration, using the BLIS framework

The Parallella is a hybrid computing platform that came into existence as the result of a Kickstarter project by Adapteva. It is composed of the high performance, energy-efficient, manycore architecture, Epiphany chip (used as co-processor)…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-17 Miguel Tasende

Conceptual and Technical Challenges for High Performance Computing

High Performance Computing (HPC) aims at providing reasonably fast computing solutions to scientific and real life problems. The advent of multicore architectures is noticeable in the HPC history, because it has brought the underlying…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-07 Claude Tadonki

Performance Comparison of Python Translators for a Multi-threaded CPU-bound Application

Currently, Python is one of the most widely used languages in various application areas. However, it has limitations when it comes to optimizing and parallelizing applications due to the nature of its official CPython interpreter,…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-05-24 Andrés Milla , Enzo Rucci

A Distributed Shared Memory Model and C++ Templated Meta-Programming Interface for the Epiphany RISC Array Processor

The Adapteva Epiphany many-core architecture comprises a scalable 2D mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality. Whereas such a processor offers high computational energy efficiency and parallel…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-04-28 David Richie , James Ross , Jamie Infantolino

Triangulating Python Performance Issues with Scalene

This paper proposes Scalene, a profiler specialized for Python. Scalene combines a suite of innovations to precisely and simultaneously profile CPU, memory, and GPU usage, all with low overhead. Scalene's CPU and memory profilers help…

Programming Languages · Computer Science 2023-03-24 Emery D. Berger , Sam Stern , Juan Altmayer Pizzorno

A new kind of parallelism and its programming in the Explicitly Many-Processor Approach

The processor accelerators are effective because they are working not (completely) on principles of stored program computers. They use some kind of parallelism, and it is rather hard to program them effectively: a parallel architecture by…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-08-26 János Végh