Related papers: Extending and Implementing the Self-adaptive Virtu…

Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from Concrete Concurrency Models

The upcoming many-core architectures require software developers to exploit concurrency to utilize available computational power. Today's high-level language virtual machines (VMs), which are a cornerstone of software development, do not…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-02-05 Stefan Marr , Michael Haupt , Stijn Timbermont , Bram Adams , Theo D'Hondt , Pascal Costanza , Wolfgang De Meuter

New Trends in Parallel and Distributed Simulation: from Many-Cores to Cloud Computing

Recent advances in computing architectures and networking are bringing parallel computing systems to the masses so increasing the number of potential users of these kinds of systems. In particular, two important technological evolutions are…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-04-05 Gabriele D'Angelo , Moreno Marzolla

Shared Virtual Memory: Its Design and Performance Implications for Diverse Applications

Discrete GPU accelerators, while providing massive computing power for supercomputers and data centers, have their separate memory domain. Explicit memory management across device and host domains in programming is tedious and error-prone.…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-14 Bennett Cooper , Thomas R. W. Scogland , Rong Ge

A Parallel SystemC Virtual Platform for Neuromorphic Architectures

With the increasing interest in neuromorphic computing, designers of embedded systems face the challenge of efficiently simulating such platforms to enable architecture design exploration early in the development cycle. Executing artificial…

Hardware Architecture · Computer Science 2021-12-28 Melvin Galicia , Farhad Merchant , Rainer Leupers

Using Virtual Addresses with Communication Channels

While for single processor and SMP machines, memory is the allocatable quantity, for machines made up of large amounts of parallel computing units, each with its own local memory, the allocatable quantity is a single computing unit. Where…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-02-28 Oskar Schirmer

Evaluating the Self-Optimization Process of the Adaptive Memory Management Architecture Self-aware Memory

With the continuously increasing integration level, manycore processor systems are likely to be the coming system structure not only in HPC but also for desktop or mobile systems. Nowadays manycore processors like Tilera TILE, KALRAY MPPA…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-05-14 Oliver Mattes , Wolfgang Karl

Parallel and Distributed Simulation from Many Cores to the Public Cloud (Extended Version)

In this tutorial paper, we will firstly review some basic simulation concepts and then introduce the parallel and distributed simulation techniques in view of some new challenges of today and tomorrow. More in particular, in the last years…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-03-19 Gabriele D'Angelo

Fast Support Vector Machines Using Parallel Adaptive Shrinking on Distributed Systems

Support Vector Machines (SVM), a popular machine learning technique, has been applied to a wide range of domains such as science, finance, and social networks for supervised learning. Whether it is identifying high-risk patients by…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-06-20 Jeyanthi Narasimhan , Abhinav Vishnu , Lawrence Holder , Adolfy Hoisie

Evaluating Cache Coherent Shared Virtual Memory for Heterogeneous Multicore Chips

The trend in industry is towards heterogeneous multicore processors (HMCs), including chips with CPUs and massively-threaded throughput-oriented processors (MTTOPs) such as GPUs. Although current homogeneous chips tightly couple the cores…

Hardware Architecture · Computer Science 2013-10-30 Blake A. Hechtman , Daniel J. Sorin

Performance limitations for sparse matrix-vector multiplications on current multicore environments

The increasing importance of multicore processors calls for a reevaluation of established numerical algorithms in view of their ability to profit from this new hardware concept. In order to optimize the existent algorithms, a detailed…

Performance · Computer Science 2012-03-01 Gerald Schubert , Georg Hager , Holger Fehske

SAPA: Self-Aware Polymorphic Architecture

In this work, we introduce a Self-Aware Polymorphic Architecture (SAPA) design approach to support emerging context-aware applications and mitigate the programming challenges caused by the ever-increasing complexity and heterogeneity of…

Hardware Architecture · Computer Science 2018-02-15 Michel A. Kinsy , Mihailo Isakov , Alan Ehret , Donato Kava

MultiVic: A Time-Predictable RISC-V Multi-Core Processor Optimized for Neural Network Inference

Real-time systems, particularly those used in domains like automated driving, are increasingly adopting neural networks. From this trend arises the need for high-performance hardware exhibiting predictable timing behavior. While…

Hardware Architecture · Computer Science 2026-02-26 Maximilian Kirschner , Konstantin Dudzik , Ben Krusekamp , Jürgen Becker

Virtual memory for real-time systems using hPMP

To satisfy automotive safety and security requirements, memory protection mechanisms are an essential component of automotive microcontrollers. In today's available systems, either a fully physical address-based protection is implemented…

Hardware Architecture · Computer Science 2025-04-08 Konrad Walluszik , Daniel Auge , Gerhard Wirrer , Holm Rauchfuss , Thomas Roecker

Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Systems

Several manufacturers have already started to commercialize near-bank Processing-In-Memory (PIM) architectures. Near-bank PIM architectures place simple cores close to DRAM banks and can yield significant performance and energy improvements…

Hardware Architecture · Computer Science 2022-04-05 Christina Giannoula , Ivan Fernandez , Juan Gómez-Luna , Nectarios Koziris , Georgios Goumas , Onur Mutlu

Support Vector Machine Implementation on MPI-CUDA and Tensorflow Framework

Support Vector Machine (SVM) algorithm requires a high computational cost (both in memory and time) to solve a complex quadratic programming (QP) optimization problem during the training process. Consequently, SVM necessitates high…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-28 Islam Elgarhy

Scalable data abstractions for distributed parallel computations

The ability to express a program as a hierarchical composition of parts is an essential tool in managing the complexity of software and a key abstraction this provides is to separate the representation of data from the computation. Many…

Programming Languages · Computer Science 2012-10-04 James Hanlon , Simon J. Hollis , David May

SEDM: Scalable Self-Evolving Distributed Memory for Agents

Long-term multi-agent systems inevitably generate vast amounts of trajectories and historical interactions, which makes efficient memory management essential for both performance and scalability. Existing methods typically depend on vector…

Artificial Intelligence · Computer Science 2025-09-29 Haoran Xu , Jiacong Hu , Ke Zhang , Lei Yu , Yuxin Tang , Xinyuan Song , Yiqun Duan , Lynn Ai , Bill Shi

Software-Distributed Shared Memory for Heterogeneous Machines: Design and Use Considerations

Distributed shared memory (DSM) allows to implement and deploy applications onto distributed architectures using the convenient shared memory programming model in which a set of tasks are able to allocate and access data despite their…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-04 Loïc Cudennec

Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine

Shared virtual memory (SVM) is key in heterogeneous systems on chip (SoCs), which combine a general-purpose host processor with a many-core accelerator, both for programmability and to avoid data duplication. However, SVM can bring a…

Hardware Architecture · Computer Science 2018-08-30 Andreas Kurth , Pirmin Vogel , Andrea Marongiu , Luca Benini

Work-in-Progress: Real-Time Neural Network Inference on a Custom RISC-V Multicore Vector Processor

Neural networks are increasingly used in real-time systems, such as automated driving applications. This requires high-performance hardware with predictable timing behavior. State-of-the-art real-time hardware is limited in memory and…

Hardware Architecture · Computer Science 2024-10-15 Maximilian Kirschner , Konstantin Dudzik , Jürgen Becker