Related papers: Minimizing Communication for Eigenproblems and the…

Minimizing the Arithmetic and Communication Complexity of Jacobi's Method for Eigenvalues and Singular Values: Part One -- Serial Algorithms

We analyze several versions of Jacobi's method for the symmetric eigenvalue problem. Our goal is to reduce the asymptotic cost of the algorithm as much as possible, as measured by the number of arithmetic operations performed and associated…

Numerical Analysis · Mathematics 2026-04-21 James Demmel , Hengrui Luo , Ryan Schneider , Yifu Wang

Communication-optimal Parallel and Sequential Cholesky Decomposition

Numerical algorithms have two kinds of costs: arithmetic and communication, by which we mean either moving data between levels of a memory hierarchy (in the sequential case) or over a network connecting processors (in the parallel case).…

Numerical Analysis · Computer Science 2011-02-02 Grey Ballard , James Demmel , Olga Holtz , Oded Schwartz

A communication-avoiding parallel algorithm for the symmetric eigenvalue problem

Many large-scale scientific computations require eigenvalue solvers in a scaling regime where efficiency is limited by data movement. We introduce a parallel algorithm for computing the eigenvalues of a dense symmetric matrix, which…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-04-19 Edgar Solomonik , Grey Ballard , James Demmel , Torsten Hoefler

Communication Lower Bounds and Optimal Algorithms for Symmetric Matrix Computations

In this article, we focus on the communication costs of three symmetric matrix computations: i) multiplying a matrix with its transpose, known as a symmetric rank-k update (SYRK) ii) adding the result of the multiplication of a matrix with…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-18 Hussam Al Daas , Grey Ballard , Laura Grigori , Suraj Kumar , Kathryn Rouse , Mathieu Verite

Minimizing Communication in Linear Algebra

In 1981 Hong and Kung proved a lower bound on the amount of communication needed to perform dense, matrix-multiplication using the conventional $O(n^3)$ algorithm, where the input matrices were too large to fit in the small, fast memory. In…

Computational Complexity · Computer Science 2011-09-20 Grey Ballard , James Demmel , Olga Holtz , Oded Schwartz

Minimizing Communication for Parallel Symmetric Tensor Times Same Vector Computation

In this article, we focus on the parallel communication cost of multiplying the same vector along two modes of a $3$-dimensional symmetric tensor. This is a key computation in the higher-order power method for determining eigenpairs of a…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-19 Hussam Al Daas , Grey Ballard , Laura Grigori , Suraj Kumar , Kathryn Rouse , Mathieu Vérité

Upper and Lower Bounds on the Cost of a Map-Reduce Computation

In this paper we study the tradeoff between parallelism and communication cost in a map-reduce computation. For any problem that is not "embarrassingly parallel," the finer we partition the work of the reducers so that more parallelism can…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-06-21 Foto N. Afrati , Anish Das Sarma , Semih Salihoglu , Jeffrey D. Ullman

Communication Lower Bounds and Algorithms for Sketching with Random Dense Matrices

Sketching is widely used in randomized linear algebra for low-rank matrix approximation, column subset selection, and many other problems, and it has gained significant traction in machine learning applications. However, sketching large…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-24 Hussam Al Daas , Grey Ballard , Laura Grigori , Md Taufique Hussain , Suraj Kumar , Mohammad Marufur Rahman , Kathryn Rouse

Communication lower bounds and optimal algorithms for programs that reference arrays -- Part 1

The movement of data (communication) between levels of a memory hierarchy, or between parallel processors on a network, can greatly dominate the cost of computation, so algorithms that minimize communication are of interest. Motivated by…

Classical Analysis and ODEs · Mathematics 2013-08-03 Michael Christ , James Demmel , Nicholas Knight , Thomas Scanlon , Katherine Yelick

On Communication Cost of Distributed Statistical Estimation and Dimensionality

We explore the connection between dimensionality and communication cost in distributed learning problems. Specifically we study the problem of estimating the mean $\vec{\theta}$ of an unknown $d$ dimensional gaussian distribution in the…

Machine Learning · Computer Science 2014-11-11 Ankit Garg , Tengyu Ma , Huy L. Nguyen

A 3D Parallel Algorithm for QR Decomposition

Interprocessor communication often dominates the runtime of large matrix computations. We present a parallel algorithm for computing QR decompositions whose bandwidth cost (communication volume) can be decreased at the cost of increasing…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-05-15 Grey Ballard , James Demmel , Laura Grigori , Mathias Jacquelin , Nicholas Knight

Graph Expansion and Communication Costs of Fast Matrix Multiplication

The communication cost of algorithms (also known as I/O-complexity) is shown to be closely related to the expansion properties of the corresponding computation graphs. We demonstrate this on Strassen's and other fast matrix multiplication…

Data Structures and Algorithms · Computer Science 2011-09-12 Grey Ballard , James Demmel , Olga Holtz , Oded Schwartz

A Parallel Distributed Algorithm for the Power SVD Method

In this work, we study how to implement a distributed algorithm for the power method in a parallel manner. As the existing distributed power method is usually sequentially updating the eigenvectors, it exhibits two obvious disadvantages: 1)…

Information Theory · Computer Science 2021-08-16 Jiaying Li , Sissi Xiaoxiao Wu , Qiang Li , Anna Scaglione

Communication-Efficient, 2D Parallel Stochastic Gradient Descent for Distributed-Memory Optimization

Distributed-memory implementations of numerical optimization algorithm, such as stochastic gradient descent (SGD), require interprocessor communication at every iteration of the algorithm. On modern distributed-memory clusters where…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-14 Aditya Devarakonda , Ramakrishnan Kannan

Locally Self-Adjusting Skip Graphs

We present a distributed self-adjusting algorithm for skip graphs that minimizes the average routing costs between arbitrary communication pairs by performing topological adaptation to the communication pattern. Our algorithm is fully…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-04-05 Sikder Huq , Sukumar Ghosh

A Reexamination of the Communication Bandwidth Cost Analysis of A Parallel Recursive Algorithm for Solving Triangular Systems of Linear Equations

This paper presents a reexamination of the research paper titled "Communication-Avoiding Parallel Algorithms for \proc{TRSM}" by Wicky et al. We focus on the communication bandwidth cost analysis presented in the original work and identify…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-02 Yuan Tang

Communication Lower Bounds and Optimal Algorithms for Multiple Tensor-Times-Matrix Computation

Multiple Tensor-Times-Matrix (Multi-TTM) is a key computation in algorithms for computing and operating with the Tucker tensor decomposition, which is frequently used in multidimensional data analysis. We establish communication lower…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-02-03 Hussam Al Daas , Grey Ballard , Laura Grigori , Suraj Kumar , Kathryn Rouse

Optimal Communication-Computation Trade-Off in Heterogeneous Gradient Coding

Gradient coding allows a master node to derive the aggregate of the partial gradients, calculated by some worker nodes over the local data sets, with minimum communication cost, and in the presence of stragglers. In this paper, for gradient…

Information Theory · Computer Science 2021-03-03 Tayyebeh Jahani-Nezhad , Mohammad Ali Maddah-Ali

Improving the Bit Complexity of Communication for Distributed Convex Optimization

We consider the communication complexity of some fundamental convex optimization problems in the point-to-point (coordinator) and blackboard communication models. We strengthen known bounds for approximately solving linear regression,…

Data Structures and Algorithms · Computer Science 2024-03-29 Mehrdad Ghadiri , Yin Tat Lee , Swati Padmanabhan , William Swartworth , David Woodruff , Guanghao Ye

Randomized Distributed Mean Estimation: Accuracy vs Communication

We consider the problem of estimating the arithmetic average of a finite collection of real vectors stored in a distributed fashion across several compute nodes subject to a communication budget constraint. Our analysis does not rely on any…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-24 Jakub Konečný , Peter Richtárik