Related papers: Evolving the COLA software library
In recent history, GPUs became a key driver of compute performance in HPC. With the installation of the Frontier supercomputer, they became the enablers of the Exascale era; further largest-scale installations are in progress (Aurora, El…
Modern graphics hardware is designed for highly parallel numerical tasks and provides significant cost and performance benefits. Graphics hardware vendors are now making available development tools to support general purpose high…
We present a comparison of several modern C++ libraries providing high-level interfaces for programming multi- and many-core architectures on top of CUDA or OpenCL. The comparison focuses on the solution of ordinary differential equations…
NVIDIA has been the main provider of GPU hardware in HPC systems for over a decade. Most applications that benefit from GPUs have thus been developed and optimized for the NVIDIA software stack. Recent exascale HPC systems are, however,…
The SX-Aurora TSUBASA PCIe accelerator card is the newest model of NEC's SX architecture family. Its multi-core vector processor features a vector length of 16 kbits and interfaces with up to 48 GB of HBM2 memory in the current models,…
Supercomputers become faster as hardware and software technologies continue to evolve. Current supercomputers are capable of 1015 floating point operations per second (FLOPS) that called Petascale system. The High Performance Computer (HPC)…
Decentralized machine learning is a promising emerging paradigm in view of global challenges of data ownership and privacy. We consider learning of linear classification and regression models, in the setting where the training data is…
Bioinformatics and Computational Biology are two fields that have been exploiting GPUs for more than two decades, being CUDA the most used programming language for them. However, as CUDA is an NVIDIA proprietary language, it implies a…
The inability to predict lasting languages and architectures led us to develop OCCA, a C++ library focused on host-device interaction. Using run-time compilation and macro expansions, the result is a novel single kernel language that…
The presence of GPU from different vendors demands the Lattice QCD codes to support multiple architectures. To this end, Open Computing Language (OpenCL) is one of the viable frameworks for writing a portable code. It is of interest to find…
This work explores the feasibility of specialized hardware implementing the Cortical Learning Algorithm (CLA) in order to fully exploit its inherent advantages. This algorithm, which is inspired in the current understanding of the mammalian…
FPGAs have found increasing adoption in data center applications since a new generation of high-level tools have become available which noticeably reduce development time for FPGA accelerators and still provide high quality of results.…
As CUDA programs become the de facto program among data parallel applications such as high-performance computing or machine learning applications, running CUDA on other platforms has been a compelling option. Although several efforts have…
In recent years the computational capacity of single Field Programmable Gate Arrays (FPGA) devices as well as their versatility has increased significantly. Adding to that the High Level Synthesis frameworks allowing to program such…
Recent increases in supercomputing power, driven by the multi-core revolution and accelerators such as the IBM Cell processor, graphics processing units (GPUs) and Intel's Many Integrated Core (MIC) technology have enabled kinetic…
Using commodity component personal computers based on Alpha processor and commodity network devices and a switch, we built an 8-node parallel computer. GNU/Linux is chosen as an operating system and message passing libraries such as PVM,…
We define a benchmark suite for lattice QCD and report on benchmark results from several computer platforms. The platforms considered are apeNEXT, CRAY T3E, Hitachi SR8000, IBM p690, PC-Clusters, and QCDOC.
The heterogeneous computing paradigm has led to the need for portable and efficient programming solutions that can leverage the capabilities of various hardware devices, such as NVIDIA, Intel, and AMD GPUs. This study evaluates the…
Heterogeneity is the prevalent trend in the rapidly evolving high-performance computing (HPC) landscape in both hardware and application software. The diversity in hardware platforms, currently comprising various accelerators and a future…
The world's largest particle accelerator, located at CERN, produces petabytes of data that need to be analysed efficiently, to study the fundamental structures of our universe. ROOT is an open-source C++ data analysis framework, developed…