Alexander Strack
Gaussian processes are widely used in machine learning domains but remain computationally demanding, limiting their efficient scalability across emerging hardware platforms. The GPRat library addresses these challenges using the HPX…
Hydrodynamics And Radiation Diffusion} (HARD) is an open-source application for high-performance simulations of compressible hydrodynamics with radiation-diffusion coupling. Built on the FleCSI (Flexible Computational Science…
Many important real-world applications, such as System Identification with Gaussian Processes, involve solving linear systems with symmetric positive-definite matrices. The iterative CG method and direct solvers based on the Cholesky…
Writing efficient distributed code remains a labor-intensive and complex endeavor. To simplify application development, the Flexible Computational Science Infrastructure (FleCSI) framework offers a user-oriented, high-level programming…
Large Language Models (LLM) show strong abilities in code generation, but their skill in creating efficient parallel programs is less studied. This paper explores how LLMs generate task-based parallel code from three kinds of input prompts:…
Gaussian processes (GPs) are a widely used regression tool, but the cubic complexity of exact solvers limits their scalability. To address this challenge, we extend the GPRat library by incorporating a fully GPU-resident GP prediction…
Rapid advancements in RISC-V hardware development shift the focus from low-level optimizations to higher-level parallelization. Recent RISC-V processors, such as the SOPHON SG2042, have 64 cores. RISC-V processors with core counts…
Python is the de-facto language for software development in artificial intelligence (AI). Commonly used libraries, such as PyTorch and TensorFlow, rely on parallelization built into their BLAS backends to achieve speedup on CPUs. However,…
Due to increasing core counts in modern processors, several task-based runtimes emerged, including the C++ Standard Library for Concurrency and Parallelism (HPX). Although the asynchronous many-task runtime HPX allows implicit communication…
Parallel algorithms relying on synchronous parallelization libraries often experience adverse performance due to global synchronization barriers. Asynchronous many-task runtimes offer task futurization capabilities that minimize or remove…