English
Related papers

Related papers: Soft Tiles: Capturing Physical Implementation Flex…

200 papers

FPGA overlays are commonly implemented as coarse-grained reconfigurable architectures with a goal to improve designers' productivity through balancing flexibility and ease of configuration of the underlying fabric. To truly facilitate full…

Hardware Architecture · Computer Science 2016-06-22 Ho-Cheung Ng , Cheng Liu , Hayden Kwok-Hay So

Systolic arrays and shared-L1-memory manycore clusters are commonly used architectural paradigms that offer different trade-offs to accelerate parallel workloads. While the first excel with regular dataflow at the cost of rigid…

Hardware Architecture · Computer Science 2024-04-25 Sergio Mazzola , Samuel Riedel , Luca Benini

Recent applications in the domain of near-sensor computing require the adoption of floating-point arithmetic to reconcile high precision results with a wide dynamic range. In this paper, we propose a multi-core computing cluster that…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-12 Fabio Montagna , Stefan Mach , Simone Benatti , Angelo Garofalo , Gianmarco Ottavi , Luca Benini , Davide Rossi , Giuseppe Tagliavini

Chip industry continues advancing and expanding modern computing systems, resulting in more complex multi-core processors. Conversely, academic projects face scalability challenges due to limited resources, highlighting the need for…

As computational paradigms evolve, applications such as attention-based models, wireless telecommunications, and computer vision impose increasingly challenging requirements on computer architectures: significant memory footprints and…

Hardware Architecture · Computer Science 2025-04-08 Sergio Mazzola , Yichao Zhang , Marco Bertuletti , Diyou Shen , Luca Benini

Graphics Processing Units (GPUs) are becoming popular accelerators in modern High-Performance Computing (HPC) clusters. Installing GPUs on each node of the cluster is not efficient resulting in high costs and power consumption as well as…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-06-15 Javier Prades , Blesson Varghese , Carlos Reano , Federico Silla

Shared L1-memory clusters of streamlined instruction processors (processing elements - PEs) are commonly used as building blocks in modern, massively parallel computing architectures (e.g. GP-GPUs). Scaling out these architectures by…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-03 Yichao Zhang , Marco Bertuletti , Chi Zhang , Samuel Riedel , Diyou Shen , Bowen Wang , Alessandro Vanelli-Coralli , Luca Benini

The emergence of heterogeneity and domain-specific architectures targeting deep learning inference show great potential for enabling the deployment of modern CNNs on resource-constrained embedded platforms. A significant development is the…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-25 Dmitri Lyalikov

With rapidly evolving technology, multicore and manycore processors have emerged as promising architectures to benefit from increasing transistor numbers. The transition towards these parallel architectures makes today an exciting time to…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-04-01 Ashkan Tousimojarad , Wim Vanderbauwhede

Next-generation wireless technologies (for immersive-massive communication, joint communication and sensing) demand highly parallel architectures for massive data processing. A common architectural template scales up by grouping tens to…

Hardware Architecture · Computer Science 2025-07-08 Samuel Riedel , Yichao Zhang , Marco Bertuletti , Luca Benini

The success of DNNs and their high computational requirements pushed for large codesign efforts aiming at DNN acceleration. Since DNNs can be represented as static computational graphs, static memory allocation and tiling are two crucial…

Hardware Architecture · Computer Science 2025-04-08 Victor J. B. Jung , Alessio Burrello , Francesco Conti , Luca Benini

Many appplications in computational science are sufficiently compute-intensive that they depend on the power of parallel computing for viability. For all but the "embarrassingly parallel" problems, the performance depends upon the level of…

High Energy Physics - Lattice · Physics 2009-09-29 Z. Sroczynski , N. Eicker , Th. Lippert , B. Orth , K. Schilling

Scaling up hardware systems has become an important tactic for improving performance as Moore's law fades. Unfortunately, simulations of large hardware systems are often a design bottleneck due to slow throughput and long build times. In…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-31 Steven Herbst , Noah Moroze , Edgar Iglesias , Andreas Olofsson

Structured Cartesian grids are a fundamental component in numerical simulations. Although these grids facilitate straightforward discretization schemes, their na\"{i}ve use in sparse domains leads to excessive memory overhead and…

Computational Engineering, Finance, and Science · Computer Science 2025-12-15 Fan Gu , Xiangyu Hu

We demonstrate that general-purpose memory allocation involving many threads on many cores can be done with high performance, multicore scalability, and low memory consumption. For this purpose, we have designed and implemented scalloc, a…

Programming Languages · Computer Science 2015-08-26 Martin Aigner , Christoph M. Kirsch , Michael Lippautz , Ana Sokolova

Consecutive matrix multiplications are commonly used in graph neural networks and sparse linear solvers. These operations frequently access the same matrices for both reading and writing. While reusing these matrices improves data locality,…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-02 Mohammad Mahdi Salehi Dezfuli , Kazem Cheshmi

In this article, a new generic higher-order finite-element framework for massively parallel simulations is presented. The modular software architecture is carefully designed to exploit the resources of modern and future supercomputers.…

Mathematical Software · Computer Science 2018-05-28 Nils Kohl , Dominik Thönnes , Daniel Drzisga , Dominik Bartuschat , Ulrich Rüde

In this paper, we report a reimplementation of the core algorithms of relativistic coupled cluster theory aimed at modern heterogeneous high-performance computational infrastructures. The code is designed for efficient parallel execution on…

Real-time systems, particularly those used in domains like automated driving, are increasingly adopting neural networks. From this trend arises the need for high-performance hardware exhibiting predictable timing behavior. While…

Hardware Architecture · Computer Science 2026-02-26 Maximilian Kirschner , Konstantin Dudzik , Ben Krusekamp , Jürgen Becker

GPUs are now used for a wide range of problems within HPC. However, making efficient use of the computational power available with multiple GPUs is challenging. The main challenges in achieving good performance are memory layout, affecting…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-04-20 Robert Clucas , Philip Blakely , Nikolaos Nikiforakis
‹ Prev 1 2 3 10 Next ›