Related papers: PBBFMM3D: a parallel black-box algorithm for kerne…

Giga-scale Kernel Matrix Vector Multiplication on GPU

Kernel matrix-vector multiplication (KMVM) is a foundational operation in machine learning and scientific computing. However, as KMVM tends to scale quadratically in both memory and time, applications are often limited by these…

Numerical Analysis · Mathematics 2025-02-25 Robert Hu , Siu Lun Chau , Dino Sejdinovic , Joan Alexis Glaunès

A parallel directional Fast Multipole Method

This paper introduces a parallel directional fast multipole method (FMM) for solving N-body problems with highly oscillatory kernels, with a focus on the Helmholtz kernel in three dimensions. This class of oscillatory kernels requires a…

Numerical Analysis · Mathematics 2018-01-08 Austin R. Benson , Jack Poulson , Kenneth Tran , Björn Engquist , Lexing Ying

An O(N) and parallel approach to integral problems by a kernel-independent fast multipole method: Application to polarization and magnetization of interacting particles

Large classes of materials systems in physics and engineering are governed by magnetic and electrostatic interactions. Continuum or mesoscale descriptions of such systems can be cast in terms of integral equations, whose direct…

Computational Physics · Physics 2016-08-15 Xikai Jiang , Jiyuan Li , Xujun Zhao , Jian Qin , Dmitry Karpeev , Juan Hernandez-Ortiz , Juan de Pablo , Olle Heinonen

Performance Acceleration of Kernel Polynomial Method Applying Graphics Processing Units

The Kernel Polynomial Method (KPM) is one of the fast diagonalization methods used for simulations of quantum systems in research fields of condensed matter physics and chemistry. The algorithm has a difficulty to be parallelized on a…

Computational Physics · Physics 2011-05-30 Shixun Zhang , Shinichi Yamagiwa , Masahiko Okumura , Seiji Yunoki

Single-Shot Matrix-Matrix Multiplication Optical Tensor Processor for Deep Learning

The ever-increasing data demand craves advancements in high-speed and energy-efficient computing hardware. Analog optical neural network (ONN) processors have emerged as a promising solution, offering benefits in bandwidth and energy…

Optics · Physics 2026-04-07 Chao Luan , Ronald Davis , Zaijun Chen , Dirk Englund , Ryan Hamerly

Kernel Aggregated Fast Multipole Method: Efficient summation of Laplace and Stokes kernel functions

Many different simulation methods for Stokes flow problems involve a common computationally intense task -- the summation of a kernel function over $O(N^2)$ pairs of points. One popular technique is the Kernel Independent Fast Multipole…

Numerical Analysis · Mathematics 2021-09-07 Wen Yan , Robert Blackwell

Learning in High-Dimensional Feature Spaces Using ANOVA-Based Fast Matrix-Vector Multiplication

Kernel matrices are crucial in many learning tasks such as support vector machines or kernel ridge regression. The kernel matrix is typically dense and large-scale. Depending on the dimension of the feature space even the computation of all…

Machine Learning · Computer Science 2023-12-04 Franziska Nestler , Martin Stoll , Theresa Wagner

Reducing the Complexity of Matrix Multiplication to $O(N^2log_2N)$ by an Asymptotically Optimal Quantum Algorithm

Matrix multiplication is a fundamental classical computing operation whose efficiency becomes a major challenge at scale, especially for machine learning applications. Quantum computing, with its inherent parallelism and exponential storage…

Quantum Physics · Physics 2026-02-10 Jiaqi Yao , Ding Liu

High-performance Kernel Machines with Implicit Distributed Optimization and Randomization

In order to fully utilize "big data", it is often required to use "big models". Such models tend to grow with the complexity and size of the training data, and do not make strong parametric assumptions upfront on the nature of the…

Machine Learning · Statistics 2015-04-17 Vikas Sindhwani , Haim Avron

Parallel Support Vector Machines in Practice

In this paper, we evaluate the performance of various parallel optimization methods for Kernel Support Vector Machines on multicore CPUs and GPUs. In particular, we provide the first comparison of algorithms with explicit and implicit…

Machine Learning · Computer Science 2014-04-04 Stephen Tyree , Jacob R. Gardner , Kilian Q. Weinberger , Kunal Agrawal , John Tran

Parallel Sparse Matrix Multiplication for Linear Scaling Electronic Structure Calculations

Linear-scaling electronic-structure techniques, also called O(N) techniques, rely heavily on the multiplication of sparse matrices, where the sparsity arises from spatial cut-offs. In order to treat very large systems, the calculations must…

Materials Science · Physics 2009-10-31 D. R. Bowler , T. Miyazaki , M. J. Gillan

Parallel Computation of functions of matrices and their action on vectors

We present a novel class of methods to compute functions of matrices or their action on vectors that are suitable for parallel programming. Solving appropriate simple linear systems of equations in parallel (or computing the inverse of…

Numerical Analysis · Mathematics 2022-10-10 Sergio Blanes

A SVD accelerated kernel-independent fast multipole method and its application to BEM

The kernel-independent fast multipole method (KIFMM) proposed in [1] is of almost linear complexity. In the original KIFMM the time-consuming M2L translations are accelerated by FFT. However, when more equivalent points are used to achieve…

Numerical Analysis · Computer Science 2015-03-19 Yanchuang Cao , Lihua Wen , Junjie Rong

Algorithms for Parallel Shared-Memory Sparse Matrix-Vector Multiplication on Unstructured Matrices

The sparse matrix-vector (SpMV) multiplication is an important computational kernel, but it is notoriously difficult to execute efficiently. This paper investigates algorithm performance for unstructured sparse matrices, which are more…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-27 Kobe Bergmans , Karl Meerbergen , Raf Vandebril

The Fast Kernel Transform

Kernel methods are a highly effective and widely used collection of modern machine learning algorithms. A fundamental limitation of virtually all such methods are computations involving the kernel matrix that naively scale quadratically…

Machine Learning · Computer Science 2021-06-09 John Paul Ryan , Sebastian Ament , Carla P. Gomes , Anil Damle

Fully parallel optical matrix-matrix multiplication

In recent years, with the rapid development of electro-optic modulators, optical computing has become a potential excellent candidate for various computing tasks. New structures and devices for optical computing are emerging one after…

Optics · Physics 2023-09-20 Yufeng Zhang , Hao Yan , Kaizhi Wang

Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU Architectures

Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-10 Mehmet Deveci , Christian Trott , Sivasankaran Rajamanickam

A parallel butterfly algorithm

The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform \int K(x,y) g(y) dy at large numbers of target points when the kernel, K(x,y), is approximately low-rank when restricted…

Numerical Analysis · Mathematics 2013-11-26 Jack Poulson , Laurent Demanet , Nicholas Maxwell , Lexing Ying

Direct tensor processing with coherent light

Tensor processing is the cornerstone of modern technological advancements, powering critical applications in data analytics and artificial intelligence. While optical computing offers exceptional advantages in bandwidth, parallelism, and…

Optics · Physics 2025-06-18 Yufeng Zhang , Xiaobing Liu , Chenguang Yang , Jinlong Xiang , Hao Yan , Tianjiao Fu , Kaizhi Wang , Yikai Su , Zhipei Sun , Xuhan Guo

UPMEM Unleashed: Software Secrets for Speed

Developing kernels for Processing-In-Memory (PIM) platforms poses unique challenges in data management and parallel programming on limited processing units. Although software development kits (SDKs) for PIM, such as the UPMEM SDK, provide…

Hardware Architecture · Computer Science 2025-10-21 Krystian Chmielewski , Jarosław Ławnicki , Uladzislau Lukyanau , Tadeusz Kobus , Maciej Maciejewski