Related papers: Algorithmic patterns for $\mathcal{H}$-matrices on…

Accelerating Matrix Multiplication: A Performance Comparison Between Multi-Core CPU and GPU

Matrix multiplication is a foundational operation in scientific computing and machine learning, yet its computational complexity makes it a significant bottleneck for large-scale applications. The shift to parallel architectures, primarily…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-30 Mufakir Qamar Ansari , Mudabir Qamar Ansari

On Linear Learning with Manycore Processors

A new generation of manycore processors is on the rise that offers dozens and more cores on a chip and, in a sense, fuses host processor and accelerator. In this paper we target the efficient training of generalized linear models on these…

Performance · Computer Science 2021-10-29 Eliza Wszola , Celestine Mendler-Dünner , Martin Jaggi , Markus Püschel

Highly Parallel Sparse Matrix-Matrix Multiplication

Generalized sparse matrix-matrix multiplication is a key primitive for many high performance graph algorithms as well as some linear solvers such as multigrid. We present the first parallel algorithms that achieve increasing speedups for an…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-08-09 Aydın Buluç , John R. Gilbert

Hierarchical Matrix Operations on GPUs: Matrix-Vector Multiplication and Compression

Hierarchical matrices are space and time efficient representations of dense matrices that exploit the low rank structure of matrix blocks at different levels of granularity. The hierarchically low rank block partitioning produces…

Data Structures and Algorithms · Computer Science 2019-02-06 Wajih Halim Boukaram , George Turkiyyah , David E. Keyes

Manycore parallel computing for a hybridizable discontinuous Galerkin nested multigrid method

We present a parallel computing strategy for a hybridizable discontinuous Galerkin (HDG) nested geometric multigrid (GMG) solver. Parallel GMG solvers require a combination of coarse-grain and fine-grain parallelism to improve time to…

Numerical Analysis · Mathematics 2019-07-18 M. S. Fabien , M. G. Knepley , R. T. Mills , B. M. Riviere

Heterogeneous Highly Parallel Implementation of Matrix Exponentiation Using GPU

The vision of super computer at every desk can be realized by powerful and highly parallel CPUs or GPUs or APUs. Graphics processors once specialized for the graphics applications only, are now used for the highly computational intensive…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-04-16 Chittampally Vasanth Raja , Srinivas Balasubramanian , Prakash S Raghavendra

Efficient Matrix Factorization on Heterogeneous CPU-GPU Systems

Matrix Factorization (MF) has been widely applied in machine learning and data mining. A large number of algorithms have been studied to factorize matrices. Among them, stochastic gradient descent (SGD) is a commonly used method.…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-30 Yuanhang Yu , Dong Wen , Ying Zhang , Xiaoyang Wang , Wenjie Zhang , Xuemin Lin

A Heterogeneous Accelerated Matrix Multiplication: OpenCL + APU + GPU+ Fast Matrix Multiply

As users and developers, we are witnessing the opening of a new computing scenario: the introduction of hybrid processors into a single die, such as an accelerated processing unit (APU) processor, and the plug-and-play of additional…

Mathematical Software · Computer Science 2012-05-15 Paolo D'Alberto

Programming Massively Parallel Architectures using MARTE: a Case Study

Nowadays, several industrial applications are being ported to parallel architectures. These applications take advantage of the potential parallelism provided by multiple core processors. Many-core processors, especially the GPUs(Graphics…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-03-28 Wendell Rodrigues , Frédéric Guyomarc'h , Jean-Luc Dekeyser

Manycore processing of repeated range queries over massive moving objects observations

The ability to timely process significant amounts of continuously updated spatial data is mandatory for an increasing number of applications. Parallelism enables such applications to face this data-intensive challenge and allows the devised…

Databases · Computer Science 2014-11-13 Francesco Lettich , Salvatore Orlando , Claudio Silvestri , Christian S. Jensen

An efficient multi-core implementation of a novel HSS-structured multifrontal solver using randomized sampling

We present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination, and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which…

Mathematical Software · Computer Science 2015-02-27 Pieter Ghysels , Xiaoye S. Li , Francois-Henry Rouet , Samuel Williams , Artem Napov

A novel and scalable Multigrid algorithm for many-core architectures

Multigrid algorithms are among the fastest iterative methods known today for solving large linear and some non-linear systems of equations. Greatly optimized for serial operation, they still have a great potential for parallelism not fully…

Numerical Analysis · Computer Science 2011-08-11 Julian Becerra-Sagredo , Carlos Malaga , Francisco Mandujano

Many-core applications to online track reconstruction in HEP experiments

Interest in parallel architectures applied to real time selections is growing in High Energy Physics (HEP) experiments. In this paper we describe performance measurements of Graphic Processing Units (GPUs) and Intel Many Integrated Core…

Instrumentation and Detectors · Physics 2014-11-26 S. Amerio , D. Bastieri , M. Corvo , A. Gianelle , W. Ketchum , T. Liu , A. Lonardo , D. Lucchesi , S. Poprocki , R. Rivera , L. Tosoratto , P. Vicini , P. Wittich

Efficient hybrid topology optimization using GPU and homogenization based multigrid approach

We propose a new hybrid topology optimization algorithm based on multigrid approach that combines the parallelization strategy of CPU using OpenMP and heavily multithreading capabilities of modern Graphics Processing Units (GPU). In…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-02-01 Arya Prakash Padhi , Souvik Chakraborty , Anupam Chakrabarti , Rajib Chowdhury

Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU Architectures

Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-10 Mehmet Deveci , Christian Trott , Sivasankaran Rajamanickam

Heterogeneous Multi core processors for improving the efficiency of Market basket analysis algorithm in data mining

Heterogeneous multi core processors can offer diverse computing capabilities. The efficiency of Market Basket Analysis Algorithm can be improved with heterogeneous multi core processors. Market basket analysis algorithm utilises apriori…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-09-24 Aashiha Priyadarshni. L

A scalable H-matrix approach for the solution of boundary integral equations on multi-GPU clusters

In this work, we consider the solution of boundary integral equations by means of a scalable hierarchical matrix approach on clusters equipped with graphics hardware, i.e. graphics processing units (GPUs). To this end, we extend our…

Mathematical Software · Computer Science 2018-07-02 Helmut Harbrecht , Peter Zaspel

An inherently parallel H2-ULV factorization for solving dense linear systems on GPUs

Hierarchical low-rank approximation of dense matrices can reduce the complexity of their factorization from O(N^3) to O(N). However, the complex structure of such hierarchical matrices makes them difficult to parallelize. The block size and…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-05 Qianxiang Ma , Rio Yokota

H2Opus: A distributed-memory multi-GPU software package for non-local operators

Hierarchical $\mathcal{H}^2$-matrices are asymptotically optimal representations for the discretizations of non-local operators such as those arising in integral equations or from kernel functions. Their $O(N)$ complexity in both memory and…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-14 Stefano Zampini , Wajih Boukaram , George Turkiyyah , Omar Knio , David E. Keyes

H2-MG: A multigrid method for hierarchical rank structured matrices

This paper presents a new fast iterative solver for large systems involving kernel matrices. Advantageous aspects of H2 matrix approximations and the multigrid method are hybridized to create the H2-MG algorithm. This combination provides…

Numerical Analysis · Mathematics 2025-09-12 Daria Sushnikova , George Turkiyyah , Edmond Chow , David Keyes