Related papers: Massively Parallel Processor Architectures for Res…
Future MPSoCs with 1000 or more processor cores on a chip require new means for resource-aware programming in order to deal with increasing imperfections such as process variation, fault rates, aging effects, and power as well as thermal…
The processor accelerators are effective because they are working not (completely) on principles of stored program computers. They use some kind of parallelism, and it is rather hard to program them effectively: a parallel architecture by…
In this work, we introduce a Self-Aware Polymorphic Architecture (SAPA) design approach to support emerging context-aware applications and mitigate the programming challenges caused by the ever-increasing complexity and heterogeneity of…
Increasing demands for computing power also propel the need for energy-efficient SoC accelerator architectures. One class of such accelerators are so-called processor arrays, which typically integrate a two-dimensional mesh of…
Despite the increasing adoption of Field-Programmable Gate Arrays (FPGAs) in compute clouds, there remains a significant gap in programming tools and abstractions which can leverage network-connected, cloud-scale, multi-die FPGAs to…
With rapidly evolving technology, multicore and manycore processors have emerged as promising architectures to benefit from increasing transistor numbers. The transition towards these parallel architectures makes today an exciting time to…
Fault-tolerance techniques for stream processing engines can be categorized into passive and active approaches. A typical passive approach periodically checkpoints a processing task's runtime states and can recover a failed task by…
Loop compilation for Tightly Coupled Processor Arrays (TCPAs), a class of massively parallel loop accelerators, entails solving NP-hard problems, yet depends on the loop bounds and number of available processing elements (PEs), parameters…
Multidimensional loop kernels often suffer from control overhead that can dominate execution time on parallel loop accelerators. Tightly Coupled Processor Arrays (TCPAs) offload loop control to a global controller (GC), but existing…
Nowadays, we are to find out solutions to huge computing problems very rapidly. It brings the idea of parallel computing in which several machines or processors work cooperatively for computational tasks. In the past decades, there are a…
We initiate the study of graph algorithms in the streaming setting on massive distributed and parallel systems inspired by practical data processing systems. The objective is to design algorithms that can efficiently process evolving graphs…
In this paper, we propose TAPA, an end-to-end framework that compiles a C++ task-parallel dataflow program into a high-frequency FPGA accelerator. Compared to existing solutions, TAPA has two major advantages. First, TAPA provides a set of…
A new approach to designing processor accelerators is presented. A new computing model and a special kind of accelerator with dynamic (end-user programmable) architecture is suggested. The new model considers a processor, in which a newly…
Recent advances in computing architectures and networking are bringing parallel computing systems to the masses so increasing the number of potential users of these kinds of systems. In particular, two important technological evolutions are…
We present a new adaptive parallel algorithm for the challenging problem of multi-dimensional numerical integration on massively parallel architectures. Adaptive algorithms have demonstrated the best performance, but efficient many-core…
Many-core accelerators, as represented by the XeonPhi coprocessors and GPGPUs, allow software to exploit spatial and temporal sharing of computing resources to improve the overall system performance. To unlock this performance potential…
According to the increasing complexity of network application and internet traffic, network processor as a subset of embedded processors have to process more computation intensive tasks. By scaling down the feature size and emersion of chip…
Sequential computation is well understood but does not scale well with current technology. Within the next decade, systems will contain large numbers of processors with potentially thousands of processors per chip. Despite this, many…
High-Performance Computing (HPC) processors are nowadays integrated Cyber-Physical Systems demanding complex and high-bandwidth closed-loop power and thermal control strategies. To efficiently satisfy real-time multi-input multi-output…
Nowadays, several industrial applications are being ported to parallel architectures. These applications take advantage of the potential parallelism provided by multiple core processors. Many-core processors, especially the GPUs(Graphics…