English
Related papers

Related papers: Accelerating Pythonic coupled cluster implementati…

200 papers

In this work, we introduce new batching algorithms to effectively handle large contractions encountered in coupled-cluster singles and doubles (CCSD) implementations in Python on the Video Random Access Memory (VRAM) of graphical processing…

The main objective of this work consists in analyzing sub-structuring method for the parallel solution of sparse linear systems with matrices arising from the discretization of partial differential equations such as finite element, finite…

Numerical Analysis · Mathematics 2021-08-31 Abal-Kassim Cheik Ahamed , Frédéric Magoulès

Modern Graphics Processing Units (GPUs) are well provisioned to support the concurrent execution of thousands of threads. Unfortunately, different bottlenecks during execution and heterogeneous application requirements create imbalances in…

To execute scientific computing programs such as deep learning at high speed, GPU acceleration is a powerful option. With the recent advancements in web technologies, interfaces like WebGL and WebGPU, which utilize GPUs on the client side…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-04 Masatoshi Hidaka , Tatsuya Harada

Gaussian processes (GPs) are a widely used regression tool, but the cubic complexity of exact solvers limits their scalability. To address this challenge, we extend the GPRat library by incorporating a fully GPU-resident GP prediction…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-24 Henrik Möllmann , Dirk Pflüger , Alexander Strack

Comprehending the performance bottlenecks at the core of the intricate hardware-software interactions exhibited by highly parallel programs on HPC clusters is crucial. This paper sheds light on the issue of automatically asynchronous MPI…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-09-06 Ayesha Afzal , Georg Hager , Stefano Markidis , Gerhard Wellein

Matrix multiplication is a foundational operation in scientific computing and machine learning, yet its computational complexity makes it a significant bottleneck for large-scale applications. The shift to parallel architectures, primarily…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-30 Mufakir Qamar Ansari , Mudabir Qamar Ansari

Recently, successes have been achieved for the high-order gas-kinetic schemes (HGKS) on unstructured meshes for compressible flows. In this paper, to accelerate the computation, HGKS is implemented with the graphical processing unit (GPU)…

Numerical Analysis · Mathematics 2024-07-02 Yuhang Wang , Waixiang Cao , Liang Pan

We present an efficient implementation for running three-dimensional numerical simulations of fluid-structure interaction problems on single GPUs, based on Nvidia CUDA through Numba and Python. The incompressible flow around moving bodies…

Fluid Dynamics · Physics 2024-12-05 M. Guerrero-Hurtado , J. M. Catalán , M. Moriche , A. Gonzalo , O. Flores

Nowadays, the paradigm of parallel computing is changing. CUDA is now a popular programming model for general purpose computations on GPUs and a great number of applications were ported to CUDA obtaining speedups of orders of magnitude…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-09 Bogdan Oancea , Tudorel Andrei

Unstructured mesh tallies are a bottleneck in Monte Carlo neutral particle transport simulations of fusion reactors. This paper introduces the PUMI-Tally library that takes advantage of mesh adjacency information to accelerate these tallies…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-29 Fuad Hasan , Cameron W. Smith , Mark S. Shephard , R. Michael Churchill , George J. Wilkie , Paul K. Romano , Patrick C. Shriwise , Jacob S. Merson

Convex clustering is a popular clustering model without requiring the number of clusters as prior knowledge. It can generate a clustering path by continuously solving the model with a sequence of regularization parameter values. This paper…

Optimization and Control · Mathematics 2025-01-28 Hongfei Wu , Yancheng Yuan

High fidelity Computational Fluid Dynamics simulations are generally associated with large computing requirements, which are progressively acute with each new generation of supercomputers. However, significant research efforts are required…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-07-07 R. Borrell , D. Dosimont , M. Garcia-Gasulla , G. Houzeaux , O. Lehmkuhl , V. Mehta , H. Owen , M. Vazquez , G. Oyarzun

In this work, we examine the performance, energy efficiency and usability when using Python for developing HPC codes running on the GPU. We investigate the portability of performance and energy efficiency between CUDA and OpenCL; between…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-11 Håvard H. Holm , André R. Brodtkorb , Martin L. Sætra

This paper presents, to the author's knowledge, the first graphics processing unit (GPU) accelerated program that solves the evolution of interacting scalar fields in an expanding universe. We present the implementation in NVIDIA's Compute…

Instrumentation and Methods for Astrophysics · Physics 2014-11-20 Jani Sainio

This paper proposes a versatile high-performance execution model, inspired by systolic arrays, for memory-bound regular kernels running on CUDA-enabled GPUs. We formulate a systolic model that shifts partial sums by CUDA warp primitives for…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-09-09 Peng Chen , Mohamed Wahib , Shinichiro Takizawa , Ryousei Takano , Satoshi Matsuoka

Using GPU-based HPC platforms efficiently for coupled cluster computations is a challenge due to heterogeneous hardware structures. The constant need to adapt software to these structures and the required man-hours makes a systematization…

Chemical Physics · Physics 2025-10-07 Jan Brandejs , Johann Pototschnig , Trond Saue

We accelerated an ab-initio molecular QMC calculation by using GPGPU. Only the bottle-neck part of the calculation is replaced by CUDA subroutine and performed on GPU. The performance on a (single core CPU + GPU) is compared with that on a…

Computational Physics · Physics 2012-04-06 Yutaka Uejima , Tomoharu Terashima , Ryo Maezono

The convex hull is a fundamental geometrical structure for many applications where groups of points must be enclosed or represented by a convex polygon. Although efficient sequential convex hull algorithms exist, and are constantly being…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-09-27 Alan Keith , Héctor Ferrada , Cristóbal A. Navarro

This paper introduces and evaluates a freely available cellular nonlinear network simulator optimized for the effective use of GPUs, to achieve fast modelling and simulations. Its relevance is demonstrated for several applications in…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-02-23 Radu Dogaru , Ioana Dogaru
‹ Prev 1 2 3 10 Next ›