Related papers: A Mixed Precision, Multi-GPU Design for Large-scal…

Solving Large Top-K Graph Eigenproblems with a Memory and Compute-optimized FPGA Design

Large-scale eigenvalue computations on sparse matrices are a key component of graph analytics techniques based on spectral methods. In such applications, an exhaustive computation of all eigenvalues and eigenvectors is impractical and…

Hardware Architecture · Computer Science 2021-03-19 Francesco Sgherzi , Alberto Parravicini , Marco Siracusa , Marco Domenico Santambrogio

Scaling the memory wall using mixed-precision -- HPG-MxP on an exascale machine

Mixed-precision algorithms have been proposed as a way for scientific computing to benefit from some of the gains seen for artificial intelligence (AI) on recent high performance computing (HPC) platforms. A few applications dominated by…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-16 Aditya Kashi , Nicholson Koukpaizan , Hao Lu , Michael Matheson , Sarp Oral , Feiyi Wang

Mixed precision matrix interpolative decompositions for model reduction

Renewed interest in mixed-precision algorithms has emerged due to growing data capacity and bandwidth concerns, as well as the advancement of GPUs, which enable significant speedup for low precision arithmetic. In light of this, we propose…

Numerical Analysis · Mathematics 2020-12-14 Alec Michael Dunton , Alyson Fox

Multi-GPU Graph Analytics

We present a single-node, multi-GPU programmable graph processing library that allows programmers to easily extend single-GPU graph algorithms to achieve scalable performance on large graphs with billions of edges. Directly using the…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-03-02 Yuechao Pan , Yangzihao Wang , Yuduo Wu , Carl Yang , John D. Owens

A High Performance Implementation of Spectral Clustering on CPU-GPU Platforms

Spectral clustering is one of the most popular graph clustering algorithms, which achieves the best performance for many scientific and engineering applications. However, existing implementations in commonly used software platforms such as…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-14 Yu Jin , Joseph F. JaJa

Accelerating Scientific Computations with Mixed Precision Algorithms

On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and…

Mathematical Software · Computer Science 2015-05-13 Marc Baboulin , Alfredo Buttari , Jack Dongarra , Jakub Kurzak , Julie Langou , Julien Langou , Piotr Luszczek , Stanimire Tomov

Multi GPU Performance of Conjugate Gradient Solver with Staggered Fermions in Mixed Precision

GPU has a significantly higher performance in single-precision computing than that of double precision. Hence, it is important to take a maximal advantage of the single precision in the CG inverter, using the mixed precision method. We have…

Computational Physics · Physics 2011-11-02 Yong-Chull Jang , Hyung-Jin Kim , Weonjong Lee

Mixed precision in Graphics Processing Unit

Modern graphics computing units (GPUs) are designed and optimized to perform highly parallel numerical calculations. This parallelism has enabled (and promises) significant advantages, both in terms of energy performance and calculation. In…

Hardware Architecture · Computer Science 2021-10-26 Quentin Gallouédec

Efficient and High-quality Sparse Graph Coloring on the GPU

Graph coloring has been broadly used to discover concurrency in parallel computing. To speedup graph coloring for large-scale datasets, parallel algorithms have been proposed to leverage modern GPUs. Existing GPU implementations either have…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-22 Xuhao Chen , Pingfan Li , Jianbin Fang , Tao Tang , Zhiying Wang , Canqun Yang

Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining

Scaling up the sparse matrix-vector multiplication kernel on modern Graphics Processing Units (GPU) has been at the heart of numerous studies in both academia and industry. In this article we present a novel non-parametric, self-tunable,…

Numerical Analysis · Computer Science 2012-12-24 Xintian Yang , Srinivasan Parthasarathy , Ponnuswamy Sadayappan

High Accuracy Low Precision QR Factorization and Least Square Solver on GPU with TensorCore

Driven by the insatiable needs to process ever larger amount of data with more complex models, modern computer processors and accelerators are beginning to offer half precision floating point arithmetic support, and extremely optimized…

Mathematical Software · Computer Science 2019-12-12 Shaoshuai Zhang , Panruo Wu

Efficient Mixed-Precision Matrix Factorization of the Inverse Overlap Matrix in Electronic Structure Calculations with AI-Hardware and GPUs

In recent years, a new kind of accelerated hardware has gained popularity in the Artificial Intelligence (AI) and Machine Learning (ML) communities which enables extremely high-performance tensor contractions in reduced precision for deep…

Computational Physics · Physics 2024-05-01 Adela Habib , Joshua Finkelstein , Anders M. N. Niklasson

GPU-Accelerated Algorithms for Process Mapping

Process mapping asks to assign vertices of a task graph to processing elements of a supercomputer such that the computational workload is balanced while the communication cost is minimized. Motivated by the recent success of GPU-based graph…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-16 Petr Samoldekin , Christian Schulz , Henning Woydt

GPU Load Balancing

Fine-grained workload and resource balancing is the key to high performance for regular and irregular computations on the GPUs. In this dissertation, we conduct an extensive survey of existing load-balancing techniques to build an…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-12-20 Muhammad Osama

GPU accelerated maximum cardinality matching algorithms for bipartite graphs

We design, implement, and evaluate GPU-based algorithms for the maximum cardinality matching problem in bipartite graphs. Such algorithms have a variety of applications in computer science, scientific computing, bioinformatics, and other…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-03-07 Mehmet Deveci , Kamer Kaya , Bora Ucar , Umit V. Catalyurek

Accelerating Matrix Multiplication: A Performance Comparison Between Multi-Core CPU and GPU

Matrix multiplication is a foundational operation in scientific computing and machine learning, yet its computational complexity makes it a significant bottleneck for large-scale applications. The shift to parallel architectures, primarily…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-30 Mufakir Qamar Ansari , Mudabir Qamar Ansari

Reduced and mixed precision turbulent flow simulations using explicit finite difference schemes

The use of reduced and mixed precision computing has gained increasing attention in high-performance computing (HPC) as a means to improve computational efficiency, particularly on modern hardware architectures like GPUs. In this work, we…

Computational Engineering, Finance, and Science · Computer Science 2025-05-28 Bálint Siklósi , Pushpender K. Sharma , David J. Lusher , István Z. Reguly , Neil D. Sandham

Accelerating Direction-Optimized Breadth First Search on Hybrid Architectures

Large scale-free graphs are famously difficult to process efficiently: the skewed vertex degree distribution makes it difficult to obtain balanced partitioning. Our research instead aims to turn this into an advantage by partitioning the…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-10-05 Scott Sallinen , Abdullah Gharaibeh , Matei Ripeanu

GPU acceleration of splitting schemes applied to differential matrix equations

We consider differential Lyapunov and Riccati equations, and generalized versions thereof. Such equations arise in many different areas and are especially important within the field of optimal control. In order to approximate their…

Numerical Analysis · Mathematics 2018-10-23 Hermann Mena , Lena-Maria Pfurtscheller , Tony Stillfjord

Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU Architectures

Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-10 Mehmet Deveci , Christian Trott , Sivasankaran Rajamanickam