English
Related papers

Related papers: Code Optimization on Kepler GPUs and Xeon Phi

200 papers

NVIDIA's new architecture, Kepler improves GPU's performance significantly with the new streaming multiprocessor SMX. Along with the performance, NVIDIA has also introduced many new technologies such as direct parallelism, hyper-Q and GPU…

Recently Nvidia has released a new GPU model: GTX Titan X (TX) in a linage of the Maxwell architecture. We use our conjugate gradient code and non-perturbative renormalization code to measure the performance of TX. The results are compared…

High Energy Physics - Lattice · Physics 2015-11-03 Hwancheol Jeong , Sangbaek Lee , Weonjong Lee , Jeonghwan Pak , Jangho Kim , Juhyun Chung

Lattice Quantum Chromodynamics simulations typically spend most of the runtime in inversions of the Fermion Matrix. This part is therefore frequently optimized for various HPC architectures. Here we compare the performance of the Intel Xeon…

Computational Physics · Physics 2014-11-18 O. Kaczmarek , C. Schmidt , P. Steinbrecher , M. Wagner

The runtime of a Lattice QCD simulation is dominated by a small kernel, which calculates the product of a vector by a sparse matrix known as the "Dslash" operator. Therefore, this kernel is frequently optimized for various HPC…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-09-05 O. Kaczmarek , C. Schmidt , P. Steinbrecher , Swagato Mukherjee , M. Wagner

To optimize the geometry of airfoils for a specific application is an important engineering problem. In this context genetic algorithms have enjoyed some success as they are able to explore the search space without getting stuck in local…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-08-14 Lukas Einkemmer

Recently Graphics Processing Units (GPUs) have been used to speed up very CPU-intensive gravitational microlensing simulations. In this work, we use the Xeon Phi coprocessor to accelerate such simulations and compare its performance on a…

Instrumentation and Methods for Astrophysics · Physics 2017-03-30 Bin Chen , Ronald Kantowski , Xinyu Dai , Eddie Baron , Paul Van der Mark

With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to…

High Energy Physics - Lattice · Physics 2017-12-04 Ruizi Li , Carleton DeTar , Steven Gottlieb , Doug Toussaint

Graphics Processing Units (GPUs) are being used in many areas of physics, since the performance versus cost is very attractive. The GPUs can be addressed by CUDA which is a NVIDIA's parallel computing architecture. It enables dramatic…

High Energy Physics - Lattice · Physics 2012-10-12 Nuno Cardoso , Marco Cardoso , Pedro Bicudo

We propose an optimization approach for determining both hardware and software parameters for the efficient implementation of a (family of) applications called dense stencil computations on programmable GPGPUs. We first introduce a simple,…

Hardware Architecture · Computer Science 2017-12-26 Nirmal Prajapati , Sanjay Rajopadhye , Hristo Djidjev , Nandkishore Santhi , Tobias Grosser , Rumen Andonov

The past decade has witnessed a dramatic acceleration of lattice quantum chromodynamics calculations in nuclear and particle physics. This has been due to both significant progress in accelerating the iterative linear solvers using…

High Energy Physics - Lattice · Physics 2016-12-26 M. A. Clark , Bálint Joó , Alexei Strelchenko , Michael Cheng , Arjun Gambhir , Richard Brower

In the CFD solver Nek5000, the computation is dominated by the evaluation of small tensor operations. Nekbone is a proxy app for Nek5000 and has previously been ported to GPUs with a mixed OpenACC and CUDA approach. In this work, we…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-28 Martin Karp , Niclas Jansson , Artur Podobas , Philipp Schlatter , Stefano Markidis

We present direct astrophysical N-body simulations with up to a few million bodies using our parallel MPI/CUDA code on large GPU clusters in China, Ukraine and Germany, with different kinds of GPU hardware. These clusters are directly…

Instrumentation and Methods for Astrophysics · Physics 2013-12-09 P. Berczik , R. Spurzem , L. Wang , S. Zhong , O. Veles , I. Zinchenko , S. Huang , M. Tsai , G. Kennedy , S. Li , L. Naso , C. Li

This work studies the porting and optimization of the tensor network simulator QTensor on GPUs, with the ultimate goal of simulating quantum circuits efficiently at scale on large GPU supercomputers. We implement NumPy, PyTorch, and CuPy…

Quantum Physics · Physics 2022-04-14 Danylo Lykov , Angela Chen , Huaxuan Chen , Kristopher Keipert , Zheng Zhang , Tom Gibbs , Yuri Alexeev

We carry out a comparative performance study of multi-core CPUs, GPUs and Intel Xeon Phi (Many Integrated Core - MIC) with a microscopy image analysis application. We experimentally evaluate the performance of computing devices on core…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-05-15 George Teodoro , Tahsin Kurc , Guilherme Andrade , Jun Kong , Renato Ferreira , Joel Saltz

In this study, the gravitational octree code originally optimized for the Fermi, Kepler, and Maxwell GPU architectures is adapted to the Volta architecture. The Volta architecture introduces independent thread scheduling requiring either…

Mathematical Software · Computer Science 2018-11-08 Yohei Miki

We implement the Lanczos algorithm on an Intel Xeon Phi coprocessor and compare its performance to a multi-core Intel Xeon CPU and an NVIDIA graphics processor. The Xeon and the Xeon Phi are parallelized with OpenMP and the graphics…

Strongly Correlated Electrons · Physics 2016-09-21 Topi Siro , Ari Harju

Today, one of the main challenges for high-performance computing systems is to improve their performance by keeping energy consumption at acceptable levels. In this context, a consolidated strategy consists of using accelerators such as…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-18 Manuel Costanzo , Enzo Rucci , Ulises Costi , Franco Chichizola , Marcelo Naiouf

Graphics Processing Units (GPUs) are having a transformational effect on numerical lattice quantum chromodynamics (LQCD) calculations of importance in nuclear and particle physics. The QUDA library provides a package of mixed precision…

High Energy Physics - Lattice · Physics 2010-12-06 Ronald Babich , Michael A. Clark , Bálint Joó

We present a scheme for the parallelization of quantum Monte Carlo on graphical processing units, focusing on bosonic systems and variational Monte Carlo. We use asynchronous execution schemes with shared memory persistence, and obtain an…

Computational Physics · Physics 2014-12-10 Y. Lutsyshyn

In this work, we examine the performance, energy efficiency and usability when using Python for developing HPC codes running on the GPU. We investigate the portability of performance and energy efficiency between CUDA and OpenCL; between…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-11 Håvard H. Holm , André R. Brodtkorb , Martin L. Sætra
‹ Prev 1 2 3 10 Next ›