English

Exploiting Redundant Computation in Communication-Avoiding Algorithms for Algorithm-Based Fault Tolerance

Distributed, Parallel, and Cluster Computing 2015-11-03 v1

Abstract

Communication-avoiding algorithms allow redundant computations to minimize the number of inter-process communications. In this paper, we propose to exploit this redundancy for fault-tolerance purpose. We illustrate this idea with QR factorization of tall and skinny matrices, and we evaluate the number of failures our algorithm can tolerate under different semantics.

Keywords

Cite

@article{arxiv.1511.00212,
  title  = {Exploiting Redundant Computation in Communication-Avoiding Algorithms for Algorithm-Based Fault Tolerance},
  author = {Camille Coti},
  journal= {arXiv preprint arXiv:1511.00212},
  year   = {2015}
}
R2 v1 2026-06-22T11:33:59.965Z