Related papers: Domain Decomposition method on GPU cluster
We show that using the multi-splitting algorithm as a preconditioner for the domain wall Dirac linear operator, arising in lattice QCD, effectively reduces the inter-node communication cost, at the expense of performing more on-node…
Efficient algorithms for the solution of partial differential equations on parallel computers are often based on domain decomposition methods. Schwarz preconditioners combined with standard Krylov space solvers are widely used in this…
The generalized Dryja--Smith--Widlund (GDSW) preconditioner is a two-level overlapping Schwarz domain decomposition (DD) preconditioner that couples a classical one-level overlapping Schwarz preconditioner with an energy-minimizing coarse…
We discuss the implementation and optimization challenges for a Wilson-Dirac solver with Clover term on QPACE, a parallel machine based on Cell processors and a torus network. We choose the mixed-precision Schwarz preconditioned FGCR…
Substructured domain decomposition (DD) methods have been extensively studied, and they are usually associated with nonoverlapping decompositions. We introduce here a substructured version of Restricted Additive Schwarz (RAS) which we call…
This paper focuses on the development of a two-level preconditioner based on a fully algebraical enhancement of a Schwarz domain decomposition method. We consider the purely divergence of a Restricted Additive Scwharz iterative process…
Randomized neural networks (RaNNs), in which hidden layers remain fixed after random initialization, provide an efficient alternative for parameter optimization compared to fully parameterized networks. In this paper, RaNNs are integrated…
Sparse linear systems are typically solved using preconditioned iterative methods, but applying preconditioners via sparse triangular solves introduces bottlenecks due to irregular memory accesses and data dependencies. This work leverages…
This paper presents a parallel preconditioning approach based on incomplete LU (ILU) factorizations in the framework of Domain Decomposition (DD) for general sparse linear systems. We focus on distributed memory parallel architectures,…
Over the past five years, graphics processing units (GPUs) have had a transformational effect on numerical lattice quantum chromodynamics (LQCD) calculations in nuclear and particle physics. While GPUs have been applied with great success…
Solving discretized versions of the Dirac equation represents a large share of execution time in lattice Quantum Chromodynamics (QCD) simulations. Many high-performance computing (HPC) clusters use graphics processing units (GPUs) to offer…
We investigate the application of the additive overlapping Schwarz domain decomposition method as a preconditioner for the large sparse linear systems arising in graph-based nonlinear least-squares problems, specifically the pose-graph…
We introduce a two-level hybrid restricted additive Schwarz (RAS) preconditioner for heterogeneous steady-state convection-diffusion equations at high P\'{e}clet numbers. Our construction builds on the multiscale spectral generalized finite…
Solving the normal equations corresponding to large sparse linear least-squares problems is an important and challenging problem. For very large problems, an iterative solver is needed and, in general, a preconditioner is required to…
Modern graphics hardware is designed for highly parallel numerical tasks and promises significant cost and performance benefits for many scientific applications. One such application is lattice quantum chromodyamics (lattice QCD), where the…
OpenQ$^\star$D code has been used by the RC$^\star$ collaboration for the generation of fully dynamical QCD+QED gauge configurations with C$^\star$ boundary conditions. In this talk, optimization of solvers provided with the openQ$^\star$D…
The GPU as a digital signal processing accelerator for cloud RAN is investigated. A new design for a 5G NR low density parity check code decoder running on a GPU is presented. The algorithm is flexibly adaptable to GPU architecture to…
Our research focuses on the development of domain decomposition preconditioners tailored for second-order elliptic partial differential equations. Our approach addresses two major challenges simultaneously: i) effectively handling…
Parallel algorithms and simulators with good scalabilities are particularly important for large-scale reservoir simulations on modern supercomputers with a large number of processors. In this paper, we introduce and study a family of highly…
We study the algorithmic optimization and performance tuning of the Lattice QCD clover-fermion solver for the K computer. We implement the L\"uscher's SAP preconditioner with sub-blocking in which the lattice block in a node is further…