Related papers: GPU Algorithms for Efficient Exascale Discretizati…
Efficient exploitation of exascale architectures requires rethinking of the numerical algorithms used in many large-scale applications. These architectures favor algorithms that expose ultra fine-grain parallelism and maximize the ratio of…
Discrete optimization is a central problem in artificial intelligence. The optimization of the aggregated cost of a network of cost functions arises in a variety of problems including (W)CSP, DCOP, as well as optimization in stochastic…
The MFEM (Modular Finite Element Methods) library is a high-performance C++ library for finite element discretizations. MFEM supports numerous types of finite element methods and is the discretization engine powering many computational…
In this paper, we present algorithms and implementations for the end-to-end GPU acceleration of matrix-free low-order-refined preconditioning of high-order finite element problems. The methods described here allow for the construction of…
GPU computing is expected to play an integral part in all modern Exascale supercomputers. It is also expected that higher order Godunov schemes will make up about a significant fraction of the application mix on such supercomputers. It is,…
We demonstrate NekRS performance results on various advanced GPU architectures. NekRS is a GPU-accelerated version of Nek5000 that targets high performance on exascale platforms. It is being developed in DOE's Center of Efficient Exascale…
It has been widely accepted that Graphics Processing Units (GPU) is one of promising schemes for encryption acceleration, in particular, the support of complex mathematical calculations such as integer and logical operations makes the…
This paper discusses the potential of graphics processing units (GPUs) in high-dimensional optimization problems. A single GPU card with hundreds of arithmetic cores can be inserted in a personal computer and dramatically accelerates many…
In this thesis we develop techniques to efficiently solve numerical Partial Differential Equations (PDEs) using Graphical Processing Units (GPUs). Focus is put on both performance and re--usability of the methods developed, to this end a…
The goal of this work is to parallelize the multistep scheme for the numerical approximation of the backward stochastic differential equations (BSDEs) in order to achieve both, a high accuracy and a reduction of the computation time as…
This dissertation presents the design, implementation and evaluation of GPU-accelerated simulation frameworks for Evolutionary Spatial Cyclic Games (ESCGs), a class of agent-based models used to study ecological and evolutionary dynamics.…
The paper considers the problem of implementation on graphics processors of numerical integration routines for higher order finite element approximations. The design of suitable GPU kernels is investigated in the context of general purpose…
This work deals with the optimization of computer programs targeting Graphics Processing Units (GPUs). The goal is to lift, from programmers to optimizing compilers, the heavy burden of determining program details that are dependent on the…
This paper presents an overview of the functionalities and applications of Exasim, an open-source code for generating high-order discontinuous Galerkin codes to numerically solve parametrized partial differential equations (PDEs). The…
Finite element schemes based on discontinuous Galerkin methods possess features amenable to massively parallel computing accelerated with general purpose graphics processing units (GPUs). However, the computational performance of such…
Graphics Processing Units (GPUs) have become an integral part of High-Performance Computing to achieve an Exascale performance. The main goal of application developers of GPU is to tune their code extensively to obtain optimal performance,…
Customizing Convolution Neural Networks (CNN) for production use has been a challenging task for DL practitioners. This paper intends to expedite the model customization with a model hub that contains the optimized models tiered by their…
We consider differential Lyapunov and Riccati equations, and generalized versions thereof. Such equations arise in many different areas and are especially important within the field of optimal control. In order to approximate their…
Future computing systems, from handhelds to supercomputers, will undoubtedly be more parallel and heterogeneous than todays systems to provide more performance and energy efficiency. Thus, GPUs are increasingly being used to accelerate…
Graph representation is a powerful abstraction of real-world objects and relations. Computing the Graph Edit Distance (GED) between graphs is critical in domains such as bioinformatics, machine learning, and pattern recognition. GED…