Related papers: Massively Parallel Processor Architectures for Res…

Invasive Computing - Common Terms and Granularity of Invasion

Future MPSoCs with 1000 or more processor cores on a chip require new means for resource-aware programming in order to deal with increasing imperfections such as process variation, fault rates, aging effects, and power as well as thermal…

Operating Systems · Computer Science 2013-04-23 Jürgen Teich , Wolfgang Schröder-Preikschat , Andreas Herkersdorf

A new kind of parallelism and its programming in the Explicitly Many-Processor Approach

The processor accelerators are effective because they are working not (completely) on principles of stored program computers. They use some kind of parallelism, and it is rather hard to program them effectively: a parallel architecture by…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-08-26 János Végh

SAPA: Self-Aware Polymorphic Architecture

In this work, we introduce a Self-Aware Polymorphic Architecture (SAPA) design approach to support emerging context-aware applications and mitigate the programming challenges caused by the ever-increasing complexity and heterogeneity of…

Hardware Architecture · Computer Science 2018-02-15 Michel A. Kinsy , Mihailo Isakov , Alan Ehret , Donato Kava

Mapping and Execution of Nested Loops on Processor Arrays: CGRAs vs. TCPAs

Increasing demands for computing power also propel the need for energy-efficient SoC accelerator architectures. One class of such accelerators are so-called processor arrays, which typically integrate a two-dimensional mesh of…

Hardware Architecture · Computer Science 2025-02-18 Dominik Walter , Marita Halm , Daniel Seidel , Indrayudh Ghosh , Christian Heidorn , Frank Hannig , Jürgen Teich

TAPA-CS: Enabling Scalable Accelerator Design on Distributed HBM-FPGAs

Despite the increasing adoption of Field-Programmable Gate Arrays (FPGAs) in compute clouds, there remains a significant gap in programming tools and abstractions which can leverage network-connected, cloud-scale, multi-die FPGAs to…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-05 Neha Prakriya , Yuze Chi , Suhail Basalama , Linghao Song , Jason Cong

Cache-aware Parallel Programming for Manycore Processors

With rapidly evolving technology, multicore and manycore processors have emerged as promising architectures to benefit from increasing transistor numbers. The transition towards these parallel architectures makes today an exciting time to…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-04-01 Ashkan Tousimojarad , Wim Vanderbauwhede

Tolerating Correlated Failures in Massively Parallel Stream Processing Engines

Fault-tolerance techniques for stream processing engines can be categorized into passive and active approaches. A typical passive approach periodically checkpoints a processing task's runtime states and can recover a failed task by…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-02-05 Li Su , Yongluan Zhou

Symbolic Loop Compilation for Tightly Coupled Processor Arrays

Loop compilation for Tightly Coupled Processor Arrays (TCPAs), a class of massively parallel loop accelerators, entails solving NP-hard problems, yet depends on the loop bounds and number of available processing elements (PEs), parameters…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-01-13 Michael Witterauf , Dominik Walter , Frank Hannig , Jürgen Teich

Loop Control Management in Tightly Coupled Processor Arrays (TCPAs)

Multidimensional loop kernels often suffer from control overhead that can dominate execution time on parallel loop accelerators. Tightly Coupled Processor Arrays (TCPAs) offload loop control to a global controller (GC), but existing…

Hardware Architecture · Computer Science 2026-03-31 Dominik Walter , Frank Hannig , Jürgen Teich

Towards High Performance Computing (Hpc) Through Parallel Programming Paradigms and Their Principles

Nowadays, we are to find out solutions to huge computing problems very rapidly. It brings the idea of parallel computing in which several machines or processors work cooperatively for computational tasks. In the past decades, there are a…

Programming Languages · Computer Science 2014-02-07 Brijender Kahanwal

Streaming Graph Algorithms in the Massively Parallel Computation Model

We initiate the study of graph algorithms in the streaming setting on massive distributed and parallel systems inspired by practical data processing systems. The objective is to design algorithms that can efficiently process evolving graphs…

Data Structures and Algorithms · Computer Science 2025-01-20 Artur Czumaj , Gopinath Mishra , Anish Mukherjee

TAPA: A Scalable Task-Parallel Dataflow Programming Framework for Modern FPGAs with Co-Optimization of HLS and Physical Design

In this paper, we propose TAPA, an end-to-end framework that compiles a C++ task-parallel dataflow program into a high-frequency FPGA accelerator. Compared to existing solutions, TAPA has two major advantages. First, TAPA provides a set of…

Hardware Architecture · Computer Science 2024-10-18 Licheng Guo , Yuze Chi , Jason Lau , Linghao Song , Xingyu Tian , Moazin Khatti , Weikang Qiao , Jie Wang , Ecenur Ustun , Zhenman Fang , Zhiru Zhang , Jason Cong

A configurable accelerator for manycores: the Explicitly Many-Processor Approach

A new approach to designing processor accelerators is presented. A new computing model and a special kind of accelerator with dynamic (end-user programmable) architecture is suggested. The new model considers a processor, in which a newly…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-07-07 János Végh

New Trends in Parallel and Distributed Simulation: from Many-Cores to Cloud Computing

Recent advances in computing architectures and networking are bringing parallel computing systems to the masses so increasing the number of potential users of these kinds of systems. In particular, two important technological evolutions are…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-04-05 Gabriele D'Angelo , Moreno Marzolla

PAGANI: A Parallel Adaptive GPU Algorithm for Numerical

We present a new adaptive parallel algorithm for the challenging problem of multi-dimensional numerical integration on massively parallel architectures. Adaptive algorithms have demonstrated the best performance, but efficient many-core…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-24 Ioannis Sakiotis , Kamesh Arumugam , Marc Paterno , Desh Ranjan , Balša Terzić , Mohammad Zubair

Tuning Streamed Applications on Intel Xeon Phi: A Machine Learning Based Approach

Many-core accelerators, as represented by the XeonPhi coprocessors and GPGPUs, allow software to exploit spatial and temporal sharing of computing resources to improve the overall system performance. To unlock this performance potential…

Performance · Computer Science 2018-02-09 Peng Zhang , Jianbin Fang , Tao Tang , Canqun Yang , Zheng Wang

Effect of Thread Level Parallelism on the Performance of Optimum Architecture for Embedded Applications

According to the increasing complexity of network application and internet traffic, network processor as a subset of embedded processors have to process more computation intensive tasks. By scaling down the feature size and emersion of chip…

Hardware Architecture · Computer Science 2012-04-13 Mehdi Alipour , Hojjat Taghdisi

Emulating a large memory with a collection of small ones

Sequential computation is well understood but does not scale well with current technology. Within the next decade, systems will contain large numbers of processors with potentially thousands of processors per chip. Despite this, many…

Hardware Architecture · Computer Science 2015-11-17 James Hanlon

ControlPULP: A RISC-V On-Chip Parallel Power Controller for Many-Core HPC Processors with FPGA-Based Hardware-In-The-Loop Power and Thermal Emulation

High-Performance Computing (HPC) processors are nowadays integrated Cyber-Physical Systems demanding complex and high-bandwidth closed-loop power and thermal control strategies. To efficiently satisfy real-time multi-input multi-output…

Hardware Architecture · Computer Science 2024-02-22 Alessandro Ottaviano , Robert Balas , Giovanni Bambini , Antonio del Vecchio , Maicol Ciani , Davide Rossi , Luca Benini , Andrea Bartolini

Programming Massively Parallel Architectures using MARTE: a Case Study

Nowadays, several industrial applications are being ported to parallel architectures. These applications take advantage of the potential parallelism provided by multiple core processors. Many-core processors, especially the GPUs(Graphics…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-03-28 Wendell Rodrigues , Frédéric Guyomarc'h , Jean-Luc Dekeyser