Related papers: Algorithm 979: Recursive Algorithms for Dense Line…
Countless applications cast their computational core in terms of dense linear algebra operations. These operations can usually be implemented by combining the routines offered by standard linear algebra libraries such as BLAS and LAPACK,…
It is well known that the behavior of dense linear algebra algorithms is greatly influenced by factors like target architecture, underlying libraries and even problem size; because of this, the accurate prediction of their performance is a…
This paper advocates for an intertwined design of the dense linear algebra software stack that breaks down the strict barriers between the high-level, blocked algorithms in LAPACK (Linear Algebra PACKage) and the low-level,…
A large class of dense linear algebra operations, such as LU decomposition or inversion of a triangular matrix, are usually performed by blocked algorithms. For one such operation, typically, not only one but many algorithmic variants…
Optimal use of computing resources requires extensive coding, tuning and benchmarking. To boost developer productivity in these time consuming tasks, we introduce the Experimental Linear Algebra Performance Studies framework (ELAPS), a…
In the past two decades, some major efforts have been made to reduce exact (e.g. integer, rational, polynomial) linear algebra problems to matrix multiplication in order to provide algorithms with optimal asymptotic complexity. To provide…
LAPACK and ScaLAPACK are arguably the defacto standard libraries among the scientific community for solving linear algebra problems on sequential, shared-memory and distributed-memory architectures. While ease of use was a major design goal…
One of the greatest efforts of computational scientists is to translate the mathematical model describing a class of physical phenomena into large and complex codes. Many of these codes face the difficulty of implementing the mathematical…
Linear algebra operations are widely used in scientific computing and machine learning applications. However, it is challenging for scientists and data analysts to run linear algebra at scales beyond a single machine. Traditional approaches…
This dissertation introduces measurement-based performance modeling and prediction techniques for dense linear algebra algorithms. As a core principle, these techniques avoid executions of such algorithms entirely, and instead predict their…
Most, if not all the modern scientific simulation packages utilize matrix algebra operations. Among the operation of the linear algebra, one of the most important kernels is the multiplication of matrices, dense and sparse. Examples of…
The LINPACK benchmark reports the performance of a computer for solving a system of linear equations with dense random matrices. Although this task was not designed with a real application directly in mind, the LINPACK benchmark has been…
As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these…
A new runtime environment for the execution of recursive matrix algorithms on a supercomputer with distributed memory is proposed. It is designed both for dense and sparse matrices. The environment ensures decentralized control of the…
Dijkstra observed that verifying correctness of a program is difficult and conjectured that derivation of a program hand-in-hand with its proof of correctness was the answer. We illustrate this goal-oriented approach by applying it to the…
Previous studies have reported that common dense linear algebra operations do not achieve speed up by using multiple geographical sites of a computational grid. Because such operations are the building blocks of most scientific…
The article deals with a kind of recursive function templates in C++, where the recursion is realized corresponding template parameters to achieve better computational performance. Some specialization of these template functions ends the…
Dense linear algebra libraries, such as BLAS and LAPACK, provide a relevant collection of numerical tools for many scientific and engineering applications. While there exist high performance implementations of the BLAS (and LAPACK)…
Randomized numerical linear algebra - RandNLA, for short - concerns the use of randomization as a resource to develop improved algorithms for large-scale linear algebra computations. The origins of contemporary RandNLA lay in theoretical…
Multilinear algebra kernel performance on modern massively-parallel systems is determined mainly by data movement. However, deriving data movement-optimal distributed schedules for programs with many high-dimensional inputs is a notoriously…