Related papers: Deriving Correct High-Performance Algorithms
It is well known that the behavior of dense linear algebra algorithms is greatly influenced by factors like target architecture, underlying libraries and even problem size; because of this, the accurate prediction of their performance is a…
Countless applications cast their computational core in terms of dense linear algebra operations. These operations can usually be implemented by combining the routines offered by standard linear algebra libraries such as BLAS and LAPACK,…
Linear algebraic expressions are the essence of many computationally intensive problems, including scientific simulations and machine learning applications. However, translating high-level formulations of these expressions to efficient…
In the past two decades, some major efforts have been made to reduce exact (e.g. integer, rational, polynomial) linear algebra problems to matrix multiplication in order to provide algorithms with optimal asymptotic complexity. To provide…
We present a prototypical linear algebra compiler that automatically exploits domain-specific knowledge to generate high-performance algorithms. The input to the compiler is a target equation together with knowledge of both the structure of…
Computer Algebra systems are widely spread because of some of their remarkable features such as their ease of use and performance. Nonetheless, this focus on performance sometimes leads to unwanted consequences: algorithms and computations…
This dissertation introduces measurement-based performance modeling and prediction techniques for dense linear algebra algorithms. As a core principle, these techniques avoid executions of such algorithms entirely, and instead predict their…
This dissertation focuses on the design and the implementation of domain-specific compilers for linear algebra matrix equations. The development of efficient libraries for such equations, which lie at the heart of most software for…
We study the problem of learning differentiable functions expressed as programs in a domain-specific language. Such programmatic models can offer benefits such as composability and interpretability; however, learning them requires…
We discuss proving correctness and completeness of definite clause logic programs. We propose a method for proving completeness, while for proving correctness we employ a method which should be well known but is often neglected. Also, we…
High performance dense linear algebra (DLA) libraries often rely on a general matrix multiply (Gemm) kernel that is implemented using assembly or with vector intrinsics. In particular, the real-valued Gemm kernels provide the overwhelming…
We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective…
Numerical software in computational science and engineering often relies on highly-optimized building blocks from libraries such as BLAS and LAPACK, and while such libraries provide portable performance for a wide range of computing…
Probabilistic programming is a growing area that strives to make statistical analysis more accessible, by separating probabilistic modelling from probabilistic inference. In practice this decoupling is difficult. No single inference…
To exploit both memory locality and the full performance potential of highly tuned kernels, dense linear algebra libraries such as LAPACK commonly implement operations as blocked algorithms. However, to achieve next-to-optimal performance…
Multilinear algebra kernel performance on modern massively-parallel systems is determined mainly by data movement. However, deriving data movement-optimal distributed schedules for programs with many high-dimensional inputs is a notoriously…
Artificial intelligence has recently experienced remarkable advances, fueled by large models, vast datasets, accelerated hardware, and, last but not least, the transformative power of differentiable programming. This new programming…
Efficient high-performance libraries often expose multiple tunable parameters to provide highly optimized routines. These can range from simple loop unroll factors or vector sizes all the way to algorithmic changes, given that some…
This work introduces CLBlast, an open-source BLAS library providing optimized OpenCL routines to accelerate dense linear algebra for a wide variety of devices. It is targeted at machine learning and HPC applications and thus provides a fast…
Biclustering, also known as co-clustering or two-way clustering, simultaneously partitions the rows and columns of a data matrix to reveal submatrices with coherent patterns. Incorporating background knowledge into clustering to enhance…