Related papers: Performance portability through machine learning g…
Automated tuning of compute kernels is a popular area of research, mainly focused on finding optimal kernel parameters for a problem with fixed input sizes. This approach is good for deploying machine learning models, where the network…
Over recent years heterogeneous systems have become more prevalent across HPC systems, with over 100 supercomputers in the TOP500 incorporating GPUs or other accelerators. These hardware platforms have different performance characteristics…
The performance portability of OpenCL kernel implementations for common memory bandwidth limited linear algebra operations across different hardware generations of the same vendor as well as across vendors is studied. Certain combinations…
Heterogeneous computing, which combines devices with different architectures, is rising in popularity, and promises increased performance combined with reduced energy consumption. OpenCL has been proposed as a standard for programing such…
Linux kernel is a huge code base with enormous number of subsystems and possible configuration options that results in unmanageable complexity of elaborating an efficient configuration. Machine Learning (ML) is approach/area of learning…
High-performance computing (HPC) applications are increasingly executed in heterogeneous environments, introducing new challenges for programming and software portability. SYCL has emerged as a leading model designed to simplify…
Hand-optimizing linear algebra kernels for different GPU devices and applications is complex and labor-intensive. Instead, many developers use automatic performance tuning (autotuning) to achieve high performance on a variety of devices.…
The high-performance computing (HPC) landscape is undergoing rapid transformation, with an increasing emphasis on energy-efficient and heterogeneous computing environments. This comprehensive study extends our previous research on SYCL's…
Autotuning of performance-relevant source-code parameters allows to automatically tune applications without hard coding optimizations and thus helps with keeping the performance portable. In this paper, we introduce a benchmark set of ten…
In order to fully utilize "big data", it is often required to use "big models". Such models tend to grow with the complexity and size of the training data, and do not make strong parametric assumptions upfront on the nature of the…
Kernel methods have proven to be powerful techniques for pattern analysis and machine learning (ML) in a variety of domains. However, many of their original or advanced implementations remain in Matlab. With the incredible rise and adoption…
Applying machine learning to biological sequences - DNA, RNA and protein - has enormous potential to advance human health, environmental sustainability, and fundamental biological understanding. However, many existing machine learning…
Transfer learning refers to the process of adapting a model trained on a source task to a target task. While kernel methods are conceptually and computationally simple machine learning models that are competitive on a variety of tasks, it…
As the interest in FPGA-based accelerators for HPC applications increases, new challenges also arise, especially concerning different programming and portability issues. This paper aims to provide a snapshot of the current state of the FPGA…
Multiple Kernel Learning, or MKL, extends (kernelized) SVM by attempting to learn not only a classifier/regressor but also the best kernel for the training task, usually from a combination of existing kernel functions. Most MKL methods seek…
Optimizing GPU kernels presents a significantly greater challenge for large language models (LLMs) than standard code generation tasks, as it requires understanding hardware architecture, parallel optimization strategies, and performance…
The success of kernel-based learning methods depend on the choice of kernel. Recently, kernel learning methods have been proposed that use data to select the most appropriate kernel, usually by combining a set of base kernels. We introduce…
OpenCL is a standard for parallel programming of heterogeneous systems. The benefits of a common programming standard are clear; multiple vendors can provide support for application descriptions written according to the standard, thus…
High-performance computing (HPC) is a major driver accelerating scientific research and discovery, from quantum simulations to medical therapeutics. While the increasing availability of HPC resources is in many cases pivotal to successful…
Multiple kernel learning (MKL) method is generally believed to perform better than single kernel method. However, some empirical studies show that this is not always true: the combination of multiple kernels may even yield an even worse…