Related papers: Generating Custom Code for Efficient Query Executi…
In this survey paper, we review recent work on frameworks for the high-level, portable programming of heterogeneous multi-/manycore systems (especially, GPU-based systems) using high-level constructs such as annotated user-level software…
Future computing systems, from handhelds to supercomputers, will undoubtedly be more parallel and heterogeneous than todays systems to provide more performance and energy efficiency. Thus, GPUs are increasingly being used to accelerate…
Graph algorithms are at the heart of several applications, and achieving high performance with them has become critical due to the tremendous growth of irregular data. However, irregular algorithms are quite challenging to parallelize…
The aim of the paper is to introduce general techniques in order to optimize the parallel execution time of sorting on a distributed architectures with processors of various speeds. Such an application requires a partitioning step. For…
Generation of optimal codes is a well known problem in coding theory. Many computational approaches exist in the literature for finding record breaking codes. However generating codes with long lengths $n$ using serial algorithms is…
Developers of Molecular Dynamics (MD) codes face significant challenges when adapting existing simulation packages to new hardware. In a continuously diversifying hardware landscape it becomes increasingly difficult for scientists to be…
Probabilistic reasoning is an essential tool for robust decision-making systems because of its ability to explicitly handle real-world uncertainty, constraints and causal relations. Consequently, researchers are developing hybrid models by…
Automatic code optimization is a complex process that typically involves the application of multiple discrete algorithms that modify the program structure irreversibly. However, the design of these algorithms is often monolithic, and they…
We introduce a code generator that converts unoptimized C++ code operating on sparse data into vectorized and parallel CPU or GPU kernels. Our approach unrolls the computation into a massive expression graph, performs redundant expression…
This article presents an automatic approach to quickly derive a good solution for hardware resource partition and task granularity for task-based parallel applications on heterogeneous many-core architectures. Our approach employs a…
Accurate hardware performance models are critical to efficient code generation. They can be used by compilers to make heuristic decisions, by superoptimizers as a minimization objective, or by autotuners to find an optimal configuration for…
Automatic code generation is frequently used to create implementations of algorithms specifically tuned to particular hardware and application parameters. The code generation process involves the selection of adequate code transformations,…
The ever-increasing need for fast data processing demands new methods for efficient query execution. Just-in-time query compilation techniques have been demonstrated to improve performance in a set of analytical tasks significantly. In this…
To deliver high performance in power limited systems, architects have turned to using heterogeneous systems, either CPU+GPU or mixed CPU-hardware systems. However, in systems with different processor types and task affinities, scheduling…
Modern GPUs are able to perform significantly more arithmetic operations than transfers of a single word to or from global memory. Hence, many GPU kernels are limited by memory bandwidth and cannot exploit the arithmetic power of GPUs.…
There is a trend towards increased specialization of data management software for performance reasons. In this paper, we study the automatic specialization and optimization of database application programs -- sequences of queries and…
This work deals with the optimization of computer programs targeting Graphics Processing Units (GPUs). The goal is to lift, from programmers to optimizing compilers, the heavy burden of determining program details that are dependent on the…
The electrical and electronic engineering has used parallel programming to solve its large scale complex problems for performance reasons. However, as parallel programming requires a non-trivial distribution of tasks and data, developers…
Optimizing the performance of GPU kernels is challenging for both human programmers and code generators. For example, CUDA programmers must set thread and block parameters for a kernel, but might not have the intuition to make a good…
As the demand for computational power grows, optimizing code through compilers becomes increasingly crucial. In this context, we focus on fully automatic code optimization techniques that automate the process of selecting and applying code…