English
Related papers

Related papers: Communication-Avoiding Parallel Algorithms for Sol…

200 papers

This paper presents a reexamination of the research paper titled "Communication-Avoiding Parallel Algorithms for \proc{TRSM}" by Wicky et al. We focus on the communication bandwidth cost analysis presented in the original work and identify…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-02 Yuan Tang

Multiple Tensor-Times-Matrix (Multi-TTM) is a key computation in algorithms for computing and operating with the Tucker tensor decomposition, which is frequently used in multidimensional data analysis. We establish communication lower…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-02-03 Hussam Al Daas , Grey Ballard , Laura Grigori , Suraj Kumar , Kathryn Rouse

A parallel algorithm for solving a series of matrix equations with a constant tridiagonal matrix and different right-hand sides is proposed and studied. The process of solving the problem is represented in two steps. The first preliminary…

Numerical Analysis · Mathematics 2010-12-07 Andrew Terekhov

We present efficient and scalable parallel algorithms for performing mathematical operations for low-rank tensors represented in the tensor train (TT) format. We consider algorithms for addition, elementwise multiplication, computing norms…

Numerical Analysis · Mathematics 2021-09-08 Hussam Al Daas , Grey Ballard , Peter Benner

In this paper we develop optimal algorithms in the binary-forking model for a variety of fundamental problems, including sorting, semisorting, list ranking, tree contraction, range minima, and ordered set union, intersection and difference.…

Data Structures and Algorithms · Computer Science 2020-06-26 Guy E. Blelloch , Jeremy T. Fineman , Yan Gu , Yihan Sun

In this article, we focus on the communication costs of three symmetric matrix computations: i) multiplying a matrix with its transpose, known as a symmetric rank-k update (SYRK) ii) adding the result of the multiplication of a matrix with…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-18 Hussam Al Daas , Grey Ballard , Laura Grigori , Suraj Kumar , Kathryn Rouse , Mathieu Verite

This paper proposes a combination of a hybrid CPU--GPU and a pure GPU software implementation of a direct algorithm for solving shifted linear systems $(A - \sigma I)X = B$ with large number of complex shifts $\sigma$ and multiple…

Mathematical Software · Computer Science 2017-08-24 Nela Bosner , Zvonimir Bujanović , Zlatko Drmač

Scalable QR factorization algorithms for solving least squares and eigenvalue problems are critical given the increasing parallelism within modern machines. We introduce a more general parallelization of the CholeskyQR2 algorithm and show…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-06-18 Edward Hutter , Edgar Solomonik

Parallel matrix multiplication is one of the most studied fundamental problems in distributed and high performance computing. We obtain a new parallel algorithm that is based on Strassen's fast matrix multiplication and minimizes…

Data Structures and Algorithms · Computer Science 2012-02-16 Grey Ballard , James Demmel , Olga Holtz , Benjamin Lipshitz , Oded Schwartz

Numerical algorithms have two kinds of costs: arithmetic and communication, by which we mean either moving data between levels of a memory hierarchy (in the sequential case) or over a network connecting processors (in the parallel case).…

Numerical Analysis · Computer Science 2011-02-02 Grey Ballard , James Demmel , Olga Holtz , Oded Schwartz

Solving linear discrete ill-posed problems for third order tensor equations based on a tensor t-product has attracted much attention. But when the data tensor is produced continuously, current algorithms are not time-saving. Here, we…

Numerical Analysis · Mathematics 2021-11-30 Zhengbang Cao , Pengpeng Xie

The LMS algorithm is one of the most successful adaptive filtering algorithms. It uses the instantaneous value of the square of the error signal as an estimate of the mean-square error (MSE). The LMS algorithm changes (adapts) the filter…

Other Computer Science · Computer Science 2011-04-22 Nasrin Akhter , Kaniz Fatema , Lilatul Ferdouse , Faria Khandaker

The parallel alternating direction method of multipliers (ADMM) algorithm is widely recognized for its effectiveness in handling large-scale datasets stored in a distributed manner, making it a popular choice for solving statistical…

Machine Learning · Statistics 2023-11-22 Xiaofei Wu , Zhimin Zhang , Zhenyu Cui

We are interested in parallelizing the Least Angle Regression (LARS) algorithm for fitting linear regression models to high-dimensional data. We consider two parallel and communication avoiding versions of the basic LARS algorithm. The two…

Machine Learning · Computer Science 2020-09-15 S. Das , J. Demmel , K. Fountoulakis , L. Grigori , M. W. Mahoney , S. Yang

Many large-scale scientific computations require eigenvalue solvers in a scaling regime where efficiency is limited by data movement. We introduce a parallel algorithm for computing the eigenvalues of a dense symmetric matrix, which…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-04-19 Edgar Solomonik , Grey Ballard , James Demmel , Torsten Hoefler

We develop a method for improving the parallel scalability of the recently developed parallel selected inversion algorithm [Jacquelin, Lin and Yang 2014], named PSelInv, on massively parallel distributed memory machines. In the PSelInv…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-04-21 Mathias Jacquelin , Lin Lin , Nathan Wichmann , Chao Yang

In the realm of Large Language Model (LLM) inference, the inherent structure of transformer models coupled with the multi-GPU tensor parallelism strategy leads to a sequential execution of computation and communication. This results in…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-18 Bin Xiao , Lei Su

We propose a simple technique that, if combined with algorithms for computing functions of triangular matrices, can make them more efficient. Basically, such a technique consists in a specific scaling similarity transformation that reduces…

Numerical Analysis · Mathematics 2021-11-18 João R. Cardoso , Amir Sadeghi

Efficient parallelism is necessary for achieving low-latency, high-throughput inference with large language models (LLMs). Tensor parallelism (TP) is the state-of-the-art method for reducing LLM response latency, however GPU communications…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-27 Mert Hidayetoglu , Aurick Qiao , Michael Wyatt , Jeff Rasley , Yuxiong He , Samyam Rajbhandari

We develop and analyze new scheduling algorithms for solving sparse triangular linear systems (SpTRSV) in parallel. Our approach produces highly efficient synchronous schedules for the forward- and backward-substitution algorithm. Compared…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-06 Toni Böhnlein , Pál András Papp , Raphael S. Steiner , Christos K. Matzoros , A. N. Yzelman
‹ Prev 1 2 3 10 Next ›