Related papers: LAW: A Tool for Improved Productivity with High-Pe…
Linear algebra operations are widely used in scientific computing and machine learning applications. However, it is challenging for scientists and data analysts to run linear algebra at scales beyond a single machine. Traditional approaches…
To exploit both memory locality and the full performance potential of highly tuned kernels, dense linear algebra libraries such as LAPACK commonly implement operations as blocked algorithms. However, to achieve next-to-optimal performance…
In the past two decades, some major efforts have been made to reduce exact (e.g. integer, rational, polynomial) linear algebra problems to matrix multiplication in order to provide algorithms with optimal asymptotic complexity. To provide…
Sparse linear algebra is central to many scientific programs, yet compilers fail to optimize it well. High-performance libraries are available, but adoption costs are significant. Moreover, libraries tie programs into vendor-specific…
This paper advocates for an intertwined design of the dense linear algebra software stack that breaks down the strict barriers between the high-level, blocked algorithms in LAPACK (Linear Algebra PACKage) and the low-level,…
Optimal use of computing resources requires extensive coding, tuning and benchmarking. To boost developer productivity in these time consuming tasks, we introduce the Experimental Linear Algebra Performance Studies framework (ELAPS), a…
A major challenge in the deployment of scientific software solutions is the adaptation of research prototypes to production-grade code. While high-level languages like MATLAB are useful for rapid prototyping, they lack the resource…
One of the greatest efforts of computational scientists is to translate the mathematical model describing a class of physical phenomena into large and complex codes. Many of these codes face the difficulty of implementing the mathematical…
Numerical software in computational science and engineering often relies on highly-optimized building blocks from libraries such as BLAS and LAPACK, and while such libraries provide portable performance for a wide range of computing…
We observe a disconnect between the developers and the end users of linear algebra libraries. On the one hand, the numerical linear algebra and the high-performance communities invest significant effort in the development and optimization…
Dense linear algebra libraries, such as BLAS and LAPACK, provide a relevant collection of numerical tools for many scientific and engineering applications. While there exist high performance implementations of the BLAS (and LAPACK)…
The level of abstraction at which application experts reason about linear algebra computations and the level of abstraction used by developers of high-performance numerical linear algebra libraries do not match. The former is conveniently…
The MPLAPACK (formerly MPACK) is a multiple-precision version of LAPACK (https://www.netlib.org/lapack/). MPLAPACK version 2.0.1 is based on LAPACK version 3.9.1 and translated from Fortran 90 to C++ using FABLE, a Fortran to C++…
The translation of linear algebra computations into efficient sequences of library calls is a non-trivial task that requires expertise in both linear algebra and high-performance computing. Almost all high-level languages and libraries for…
We present SLinGen, a program generation system for linear algebra. The input to SLinGen is an application expressed mathematically in a linear-algebra-inspired language (LA) that we define. LA provides basic scalar/vector/matrix…
Previous studies have reported that common dense linear algebra operations do not achieve speed up by using multiple geographical sites of a computational grid. Because such operations are the building blocks of most scientific…
Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) form basic building blocks for several High Performance Computing (HPC) applications and hence dictate performance of the HPC applications. Performance in such…
Clawpack is a library for solving nonlinear hyperbolic partial differential equations using high-resolution finite volume methods based on Riemann solvers and limiters. It supports Adaptive Mesh Refinement (AMR), which is essential in…
This dissertation focuses on the design and the implementation of domain-specific compilers for linear algebra matrix equations. The development of efficient libraries for such equations, which lie at the heart of most software for…
Linear algebra operations, which are ubiquitous in machine learning, form major performance bottlenecks. The High-Performance Computing community invests significant effort in the development of architecture-specific optimized kernels, such…