Mathematical Software
This paper is devoted to GPU kernel optimization and performance analysis of three tensor-product operators arising in finite element methods. We provide a mathematical background to these operations and implementation details. Achieving…
Automatic differentiation (AD) is an essential primitive for machine learning programming systems. Tangent is a new library that performs AD using source code transformation (SCT) in Python. It takes numeric functions written in a syntactic…
Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) is demonstrated to efficiently solve eigenvalue problems for graph Laplacians that appear in spectral clustering. For static graph partitioning, 10-20 iterations of LOBPCG…
Code generation based software platforms, such as Firedrake, have become popular tools for developing complicated finite element discretisations of partial differential equations. We extended the code generation infrastructure in Firedrake…
We present "GEMM-like Tensor-Tensor multiplication" (GETT), a novel approach to tensor contractions that mirrors the design of a high-performance general matrix-matrix multiplication (GEMM). The critical insight behind GETT is the…
We address the reduction to compact band forms, via unitary similarity transformations, for the solution of symmetric eigenvalue problems and the computation of the singular value decomposition (SVD). Concretely, in the first case we…
The GooFit package provides physicists a simple, familiar syntax for manipulating probability density functions and performing fits, and is highly optimized for data analysis on NVIDIA GPUs and multithreaded CPU backends. GooFit was updated…
The efficiency of boundary element methods depends crucially on the time required for setting up the stiffness matrix. The far-field part of the matrix can be approximated by compression schemes like the fast multipole method or…
The acceleration of sparse matrix computations on modern many-core processors, such as the graphics processing units (GPUs), has been recognized and studied over a decade. Significant performance enhancements have been achieved for many…
Dijkstra observed that verifying correctness of a program is difficult and conjectured that derivation of a program hand-in-hand with its proof of correctness was the answer. We illustrate this goal-oriented approach by applying it to the…
Aboria is a powerful and flexible C++ library for the implementation of particle-based numerical methods. The particles in such methods can represent actual particles (e.g. Molecular Dynamics) or abstract particles used to discretise a…
We describe a new parallel implementation, mplrs, of the vertex enumeration code lrs that uses the MPI parallel environment and can be run on a network of computers. The implementation makes use of a C wrapper that essentially uses the…
We present a comparison of several modern C++ libraries providing high-level interfaces for programming multi- and many-core architectures on top of CUDA or OpenCL. The comparison focuses on the solution of ordinary differential equations…
OpenDreamKit --- "Open Digital Research Environment Toolkit for the Advancement of Mathematics" --- is an H2020 EU Research Infrastructure project that aims at supporting, over the period 2015--2019, the ecosystem of open-source…
Although reliable long precision floating-point arithmetic libraries such as QD and MPFR/GMP are necessary to solve ill-conditioned problems in numerical simulation, long precision BLAS-level computation such as matrix multiplication has…
A parallel numerical simulation algorithm is presented for fractional-order systems involving Caputo-type derivatives, based on the Adams-Bashforth-Moulton (ABM) predictor-corrector scheme. The parallel algorithm is implemented using…
Regional hydrology studies are often supported by high resolution simulations of subsurface flow that require expensive and extensive computations. Efficient usage of the latest high performance parallel computing systems becomes a…
In addition to hardware wall-time restrictions commonly seen in high-performance computing systems, it is likely that future systems will also be constrained by energy budgets. In the present work, finite difference algorithms of varying…
There are several AD tools available, which all implement different strategies for the reverse mode of AD. The major strategies are primal value taping (implemented e.g. by ADOL-c) and Jacobi taping (implemented e.g. by adept and dco/c++).…
Domain-specific languages (DSLs) are of increasing importance in scientific high-performance computing to reduce development costs, raise the level of abstraction and, thus, ease scientific programming. However, designing and implementing…