Related papers: FPGA-based Multi-Chip Module for High-Performance …
Exascale systems are predicted to have approximately one billion cores, assuming Gigahertz cores. Limitations on affordable network topologies for distributed memory systems of such massive scale bring new challenges to the current parallel…
We present and evaluate the ExaNeSt Prototype, a liquid-cooled rack prototype consisting of 256 Xilinx ZU9EG MPSoCs, 4 TBytes of DRAM, 16 TBytes of SSD, and configurable interconnection 10-Gbps hardware. We developed this testbed in…
Among the algorithms that are likely to play a major role in future exascale computing, the fast multipole method (FMM) appears as a rising star. Our previous recent work showed scaling of an FMM on GPU clusters, with problem sizes in the…
The deployment of the next generation computing platform at ExaFlops scale requires to solve new technological challenges mainly related to the impressive number (up to 10^6) of compute elements required. This impacts on system power…
In this work, we propose a configurable many-core overlay for high-performance embedded computing. The size of internal memory, supported operations and number of ports can be configured independently for each core of the overlay. The…
Hybrid memory systems, comprised of emerging non-volatile memory (NVM) and DRAM, have been proposed to address the growing memory demand of applications. Emerging NVM technologies, such as phase-change memories (PCM), memristor, and 3D…
FFT, FMM, and multigrid methods are widely used fast and highly scalable solvers for elliptic PDEs. However, emerging large-scale computing systems are introducing challenges in comparison to current petascale computers. Recent efforts…
In this work, we propose an architecture and methodology to design hardware/software systems for high-performance embedded computing on FPGA. The hardware side is based on a many-core architecture whose design is generated automatically…
Emulating chip functionality before silicon production is crucial, especially with the increasing prevalence of RISC-V-based designs. FPGAs are promising candidates for such purposes due to their high-speed and reconfigurable architecture.…
This paper presents SynapticCore-X, a modular and resource-efficient neural processing architecture optimized for deployment on low-cost FPGA platforms. The design integrates a lightweight RV32IMC RISC-V control core with a configurable…
Particle-in-Cell (PIC) Monte Carlo (MC) simulations are central to plasma physics but face increasing challenges on heterogeneous HPC systems due to excessive data movement, synchronization overheads, and inefficient utilization of multiple…
This work presents the design and preliminary performance of a highly-multiplexed superconducting detector readout. The readout system is implemented on the Xilinx ZCU111 RFSoC Evaluation Board. The current design uses 12% of the DSPs, 60%…
In recent years the computational capacity of single Field Programmable Gate Arrays (FPGA) devices as well as their versatility has increased significantly. Adding to that the High Level Synthesis frameworks allowing to program such…
The Xilinx ZCU111 Radio Frequency System on Chip (RFSoC) is a promising solution for reading out large arrays of microwave kinetic inductance detectors (MKIDs). The board boasts eight on-chip 12-bit / 4.096 GSPS analogue-to-digital…
This work arises on the environment of the ExaNeSt project aiming at design and development of an exascale ready supercomputer with low energy consumption profile but able to support the most demanding scientific and technical applications.…
FPGA-based hardware accelerators have received increasing attention mainly due to their ability to accelerate deep pipelined applications, thus resulting in higher computational performance and energy efficiency. Nevertheless, the amount of…
Low bit-width Quantized Neural Networks (QNNs) enable deployment of complex machine learning models on constrained devices such as microcontrollers (MCUs) by reducing their memory footprint. Fine-grained asymmetric quantization (i.e.,…
While FPGA accelerator boards and their respective high-level design tools are maturing, there is still a lack of multi-FPGA applications, libraries, and not least, benchmarks and reference implementations towards sustained HPC usage of…
We propose advancing photonic in-memory computing through three-dimensional photonic-electronic integrated circuits using phase-change materials (PCM) and AlGaAs-CMOS technology. These circuits offer high precision (greater than 12 bits),…
Autonomous robots require efficient on-device learning to adapt to new environments without cloud dependency. For this edge training, Microscaling (MX) data types offer a promising solution by combining integer and floating-point…