数学软件
We present an algorithm for the optimization of a class of finite element integration loop nests. This algorithm, which exploits fundamental mathematical properties of finite element operators, is proven to achieve a locally optimal…
The numerical solution of partial differential equations using the finite element method is one of the key applications of high performance computing. Local assembly is its characteristic operation. This entails the execution of a…
We describe an algorithm to evaluate all the complex branches of the Lambert W function with rigorous error bounds in interval arithmetic, which has been implemented in the Arb library. The classic 1996 paper on the Lambert W function by…
We introduce the CUDA Tensor Transpose (cuTT) library that implements high-performance tensor transposes for NVIDIA GPUs with Kepler and above architectures. cuTT achieves high performance by (a) utilizing two GPU-optimized transpose…
Background: Component-based modeling language Modelica (OpenModelica is open source implementation) is used for the numerical simulation of complex processes of different nature represented by ODE system. However, in OpenModelica standard…
High performance dense linear algebra (DLA) libraries often rely on a general matrix multiply (Gemm) kernel that is implemented using assembly or with vector intrinsics. In particular, the real-valued Gemm kernels provide the overwhelming…
Tensor contraction (TC) is an important computational kernel widely used in numerous applications. It is a multi-dimensional generalization of matrix multiplication (GEMM). While Strassen's algorithm for GEMM is well studied in theory and…
The dune-functions dune module introduces a new programmer interface for discrete and non-discrete functions. Unlike the previous interfaces considered in the existing dune modules, it is based on overloading operator(), and returning…
Large scale parameter estimation problems are among some of the most computationally demanding problems in numerical analysis. An academic researcher's domain-specific knowledge often precludes that of software design, which results in…
[abridged] Context: Multidimensional arrays are used by many different algorithms. As such, indexing and broadcasting complex operations over multidimensional arrays are ubiquitous tasks and can be performance limiting. Inquiry:…
A novel and scalable geometric multi-level algorithm is presented for the numerical solution of elliptic partial differential equations, specially designed to run with high occupancy of streaming processors inside Graphics Processing…
We describe a parallel, adaptive, multi-block algorithm for explicit integration of time dependent partial differential equations on two-dimensional Cartesian grids. The grid layout we consider consists of a nested hierarchy of fixed size,…
Extreme-scale computational science increasingly demands multiscale and multiphysics formulations. Combining software developed by independent groups is imperative: no single team has resources for all predictive science and decision…
We describe a proof of the Central Limit Theorem that has been formally verified in the Isabelle proof assistant. Our formalization builds upon and extends Isabelle's libraries for analysis and measure-theoretic probability. The proof of…
We propose a scheme for reduced-precision representation of floating point data on a continuum between IEEE-754 floating point types. Our scheme enables the use of lower precision formats for a reduction in storage space requirements and…
The increasing number of applications requiring the solution of large scale singular value problems have rekindled interest in iterative methods for the SVD. Some promising recent ad- vances in large scale iterative methods are still…
This paper presents our work on designing scalable linear solvers for large-scale reservoir simulations. The main objective is to support implementation of parallel reservoir simulators on distributed-memory parallel systems, where MPI…
SimOutUtils is a suite of MATLAB/Octave functions for studying and analyzing time series-like output from stochastic simulation models. More specifically, SimOutUtils allows modelers to study and visualize simulation output dynamics,…
Revisionist integral deferred correction (RIDC) methods are a family of parallel--in--time methods to solve systems of initial values problems. The approach is able to bootstrap lower order time integrators to provide high order…
Firedrake is a new tool for automating the numerical solution of partial differential equations. Firedrake adopts the domain-specific language for the finite element method of the FEniCS project, but with a pure Python runtime-only…