Mathematical Software
R (Version 3.5.1 patched) has an issue with its random sampling functionality. R generates random integers between $1$ and $m$ by multiplying random floats by $m$, taking the floor, and adding $1$ to the result. Well-known quantization…
This paper shows how to build a sparse tensor algebra compiler that is agnostic to tensor formats (data layouts). We develop an interface that describes formats in terms of their capabilities and properties, and show how to build a modular…
Numerical accuracy of floating point computation is a well studied topic which has not made its way to the end-user in scientific computing. Yet, it has become a critical issue with the recent requirements for code modernization to harness…
In this study, the gravitational octree code originally optimized for the Fermi, Kepler, and Maxwell GPU architectures is adapted to the Volta architecture. The Volta architecture introduces independent thread scheduling requiring either…
We show how forward-mode automatic differentiation (AD) can be employed within larger reverse-mode computations to dynamically differentiate broadcast operations in a GPU-friendly manner. Our technique fully exploits the broadcast…
This report describes the newly added Julia interface to the NFFT3 library. We explain the multidimensional NFFT algorithm and basics of the interface. Furthermore, we go into detail about the different parameters and how to adjust them…
Matrix and tensor operations form the basis of a wide range of fields and applications, and in many cases constitute a substantial part of the overall computational complexity. The ability of general-purpose GPUs to speed up many of these…
This paper addresses spatial programming of sparse matrix computations for productive performance. The challenge is how to express an irregular computation and its optimizations in a regular way. A sparse matrix has (non-zero) values and a…
We propose a platform-independent multi-threaded function library that provides data structures to generate, differentiate and render both the ordinary basis and the normalized B-basis of a user-specified extended Chebyshev (EC) space that…
Large-scale parallel numerical simulations are essential for a wide range of engineering problems that involve complex, coupled physical processes interacting across a broad range of spatial and temporal scales. The data structures involved…
Simple stencil codes are and remain an important building block in scientific computing. On shared memory nodes, they are traditionally parallelised through colouring or (recursive) tiling. New OpenMP versions alternatively allow users to…
In this work are presented and discussed some results related to the validation process of a software module based on PETSc which implements a Data Assimilation algorithm.
High-performance computing platforms are becoming more and more heterogeneous, which makes it very difficult for researchers and scientific software developers to keep up with the rapid changes on the hardware market. In this paper, the…
In recent years, persistent homology has become an attractive method for data analysis. It captures topological features, such as connected components, holes, and voids from point cloud data and summarizes the way in which these features…
System Gramian matrices are a well-known encoding for properties of input-output systems such as controllability, observability or minimality. These so-called system Gramians were developed in linear system theory for applications such as…
Owl is a new numerical library developed in the OCaml language. It focuses on providing a comprehensive set of high-level numerical functions so that developers can quickly build up data analytical applications. In this abstract, we will…
Conventional GPU implementations of Strassen's algorithm (Strassen) typically rely on the existing high-performance matrix multiplication (GEMM), trading space for time. As a result, such approaches can only achieve practical speedup for…
This is a tutorial in applied and computational topology and topological data analysis. It is illustrated with numerous computational examples that utilize Gudhi library. It is under constant development, so please do not consider this…
We equip dynamic geometry software (DGS) with a user-friendly method that enables massively parallel calculations on the graphics processing unit (GPU). This interplay of DGS and GPU opens up various applications in education and…
In this paper we propose a mixed precision algorithm in the context of the semi-Lagrangian discontinuous Galerkin method. The performance of this approach is evaluated on a traditional dual socket workstation as well as on a Xeon Phi and an…