Related papers: Temporal Vectorization for Stencils
Stencil computation is one of the most important kernels in various scientific and engineering applications. A variety of work has focused on vectorization and tiling techniques, aiming at exploiting the in-core data parallelism and data…
Stencil computation is one of the most important kernels in various scientific and engineering applications. A variety of work has focused on vectorization techniques, aiming at exploiting the in-core data parallelism. Briefly, they either…
Stencil computations consume a major part of runtime in many scientific simulation codes. As prototypes for this class of algorithms we consider the iterative Jacobi and Gauss-Seidel smoothers and aim at highly efficient parallel…
Stencil kernels dominate a range of scientific applications, including seismic and medical imaging, image processing, and neural networks. Temporal blocking is a performance optimization that aims to reduce the required memory bandwidth of…
Current architectures are now equipped with matrix computation units designed to enhance AI and high-performance computing applications. Within these architectures, two fundamental instruction types are matrix multiplication and vector…
Iterative stencils are used widely across the spectrum of High Performance Computing (HPC) applications. Many efforts have been put into optimizing stencil GPU kernels, given the prevalence of GPU-accelerated supercomputers. To improve the…
New algorithms and optimization techniques are needed to balance the accelerating trend towards bandwidth-starved multicore chips. It is well known that the performance of stencil codes can be improved by temporal blocking, lessening the…
Stencil computations are widely used to simulate the change of state of physical systems across a multidimensional grid over multiple timesteps. The state-of-the-art techniques in this area fall into three groups: cache-aware tiled looping…
Stencil computation constitutes a cornerstone of scientific computing, serving as a critical kernel in domains ranging from fluid dynamics to weather simulation. While stencil computations are conventionally regarded as memory-bound and…
Spatial computing devices have been shown to significantly accelerate stencil computations, but have so far relied on unrolling the iterative dimension of a single stencil operation to increase temporal locality. This work considers the…
Over the last ten years, graphics processors have become the de facto accelerator for data-parallel tasks in various branches of high-performance computing, including machine learning and computational sciences. However, with the recent…
Stencils represent a class of computational patterns where an output grid point depends on a fixed shape of neighboring points in an input grid. Stencil computations are prevalent in scientific applications engaging a significant portion of…
Bandwidth-starved multicore chips have become ubiquitous. It is well known that the performance of stencil codes can be improved by temporal blocking, lessening the pressure on the memory interface. We introduce a new pipelined approach…
Sparse Tensor Cores offer exceptional performance gains for AI workloads by exploiting structured 2:4 sparsity. However, their potential remains untapped for core scientific workloads such as stencil computations, which exhibit irregular…
Graph embedding aims at learning a vector-based representation of vertices that incorporates the structure of the graph. This representation then enables inference of graph properties. Existing graph embedding techniques, however, do not…
Stiff systems of ordinary differential equations (ODEs) and sparse training data are common in scientific problems. This paper describes efficient, implicit, vectorized methods for integrating stiff systems of ordinary differential…
Block iterative methods are extremely important as smoothers for multigrid methods, as preconditioners for Krylov methods, and as solvers for diagonally dominant linear systems. Developing robust and efficient algorithms suitable for…
Stencil computation is one of the most widely-used compute patterns in high performance computing applications. Spatial and temporal blocking have been proposed to overcome the memory-bound nature of this type of computation by moving…
Although modern supercomputers are composed of multicore machines, one can find scientists that still execute their legacy applications which were developed to monocore cluster where memory hierarchy is dedicated to a sole core. The main…
Temporal Graph Clustering (TGC) is a new task with little attention, focusing on node clustering in temporal graphs. Compared with existing static graph clustering, it can find the balance between time requirement and space requirement…