English
Related papers

Related papers: Minimizing Communication for Eigenproblems and the…

200 papers

We analyze several versions of Jacobi's method for the symmetric eigenvalue problem. Our goal is to reduce the asymptotic cost of the algorithm as much as possible, as measured by the number of arithmetic operations performed and associated…

Numerical Analysis · Mathematics 2026-04-21 James Demmel , Hengrui Luo , Ryan Schneider , Yifu Wang

Numerical algorithms have two kinds of costs: arithmetic and communication, by which we mean either moving data between levels of a memory hierarchy (in the sequential case) or over a network connecting processors (in the parallel case).…

Numerical Analysis · Computer Science 2011-02-02 Grey Ballard , James Demmel , Olga Holtz , Oded Schwartz

Many large-scale scientific computations require eigenvalue solvers in a scaling regime where efficiency is limited by data movement. We introduce a parallel algorithm for computing the eigenvalues of a dense symmetric matrix, which…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-04-19 Edgar Solomonik , Grey Ballard , James Demmel , Torsten Hoefler

In this article, we focus on the communication costs of three symmetric matrix computations: i) multiplying a matrix with its transpose, known as a symmetric rank-k update (SYRK) ii) adding the result of the multiplication of a matrix with…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-18 Hussam Al Daas , Grey Ballard , Laura Grigori , Suraj Kumar , Kathryn Rouse , Mathieu Verite

In 1981 Hong and Kung proved a lower bound on the amount of communication needed to perform dense, matrix-multiplication using the conventional $O(n^3)$ algorithm, where the input matrices were too large to fit in the small, fast memory. In…

Computational Complexity · Computer Science 2011-09-20 Grey Ballard , James Demmel , Olga Holtz , Oded Schwartz

In this article, we focus on the parallel communication cost of multiplying the same vector along two modes of a $3$-dimensional symmetric tensor. This is a key computation in the higher-order power method for determining eigenpairs of a…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-19 Hussam Al Daas , Grey Ballard , Laura Grigori , Suraj Kumar , Kathryn Rouse , Mathieu Vérité

In this paper we study the tradeoff between parallelism and communication cost in a map-reduce computation. For any problem that is not "embarrassingly parallel," the finer we partition the work of the reducers so that more parallelism can…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-06-21 Foto N. Afrati , Anish Das Sarma , Semih Salihoglu , Jeffrey D. Ullman

Sketching is widely used in randomized linear algebra for low-rank matrix approximation, column subset selection, and many other problems, and it has gained significant traction in machine learning applications. However, sketching large…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-24 Hussam Al Daas , Grey Ballard , Laura Grigori , Md Taufique Hussain , Suraj Kumar , Mohammad Marufur Rahman , Kathryn Rouse

The movement of data (communication) between levels of a memory hierarchy, or between parallel processors on a network, can greatly dominate the cost of computation, so algorithms that minimize communication are of interest. Motivated by…

Classical Analysis and ODEs · Mathematics 2013-08-03 Michael Christ , James Demmel , Nicholas Knight , Thomas Scanlon , Katherine Yelick

We explore the connection between dimensionality and communication cost in distributed learning problems. Specifically we study the problem of estimating the mean $\vec{\theta}$ of an unknown $d$ dimensional gaussian distribution in the…

Machine Learning · Computer Science 2014-11-11 Ankit Garg , Tengyu Ma , Huy L. Nguyen

Interprocessor communication often dominates the runtime of large matrix computations. We present a parallel algorithm for computing QR decompositions whose bandwidth cost (communication volume) can be decreased at the cost of increasing…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-05-15 Grey Ballard , James Demmel , Laura Grigori , Mathias Jacquelin , Nicholas Knight

The communication cost of algorithms (also known as I/O-complexity) is shown to be closely related to the expansion properties of the corresponding computation graphs. We demonstrate this on Strassen's and other fast matrix multiplication…

Data Structures and Algorithms · Computer Science 2011-09-12 Grey Ballard , James Demmel , Olga Holtz , Oded Schwartz

In this work, we study how to implement a distributed algorithm for the power method in a parallel manner. As the existing distributed power method is usually sequentially updating the eigenvectors, it exhibits two obvious disadvantages: 1)…

Information Theory · Computer Science 2021-08-16 Jiaying Li , Sissi Xiaoxiao Wu , Qiang Li , Anna Scaglione

Distributed-memory implementations of numerical optimization algorithm, such as stochastic gradient descent (SGD), require interprocessor communication at every iteration of the algorithm. On modern distributed-memory clusters where…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-14 Aditya Devarakonda , Ramakrishnan Kannan

We present a distributed self-adjusting algorithm for skip graphs that minimizes the average routing costs between arbitrary communication pairs by performing topological adaptation to the communication pattern. Our algorithm is fully…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-04-05 Sikder Huq , Sukumar Ghosh

This paper presents a reexamination of the research paper titled "Communication-Avoiding Parallel Algorithms for \proc{TRSM}" by Wicky et al. We focus on the communication bandwidth cost analysis presented in the original work and identify…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-02 Yuan Tang

Multiple Tensor-Times-Matrix (Multi-TTM) is a key computation in algorithms for computing and operating with the Tucker tensor decomposition, which is frequently used in multidimensional data analysis. We establish communication lower…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-02-03 Hussam Al Daas , Grey Ballard , Laura Grigori , Suraj Kumar , Kathryn Rouse

Gradient coding allows a master node to derive the aggregate of the partial gradients, calculated by some worker nodes over the local data sets, with minimum communication cost, and in the presence of stragglers. In this paper, for gradient…

Information Theory · Computer Science 2021-03-03 Tayyebeh Jahani-Nezhad , Mohammad Ali Maddah-Ali

We consider the communication complexity of some fundamental convex optimization problems in the point-to-point (coordinator) and blackboard communication models. We strengthen known bounds for approximately solving linear regression,…

Data Structures and Algorithms · Computer Science 2024-03-29 Mehrdad Ghadiri , Yin Tat Lee , Swati Padmanabhan , William Swartworth , David Woodruff , Guanghao Ye

We consider the problem of estimating the arithmetic average of a finite collection of real vectors stored in a distributed fashion across several compute nodes subject to a communication budget constraint. Our analysis does not rely on any…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-24 Jakub Konečný , Peter Richtárik
‹ Prev 1 2 3 10 Next ›