Related papers: Exploiting Redundant Computation in Communication-…
This paper presents a fault-tolerant algorithm for the QR factorization of general matrices. It relies on the communication-avoiding algorithm, and uses the structure of the reduction of each part of the computation to introduce…
When a computational task tolerates a relaxation of its specification or when an algorithm tolerates the effects of noise in its execution, hardware, programming languages, and system software can trade deviations from correct behavior for…
As techniques for fault-tolerant quantum computation keep improving, it is natural to ask: what is the fundamental lower bound on redundancy? In this paper, we obtain a lower bound on the redundancy required for $\epsilon$-accurate…
In this paper we provide a basic introduction of the core ideas and theories surrounding fault-tolerant quantum computation. These concepts underly the theoretical framework of large-scale quantum computation and communications and are the…
Fault tolerance is essential for building reliable services; however, it comes at the price of redundancy, mainly the "replication factor" and "diversity". With the increasing reliance on Internet-based services, more machines (mainly…
Fault tolerant quantum computing methods which work with efficient quantum error correcting codes are discussed. Several new techniques are introduced to restrict accumulation of errors before or during the recovery. Classes of eligible…
We evaluate strategies for reducing the run time of fault-tolerant quantum computations, targeting practical utility in scientific or industrial workflows. Delivering a technology with broad impact requires scaling devices, while also…
Application partitioning and code offloading are being researched extensively during the past few years. Several frameworks for code offloading have been proposed. However, fewer works attempted to address issues occurred with its…
Interprocessor communication often dominates the runtime of large matrix computations. We present a parallel algorithm for computing QR decompositions whose bandwidth cost (communication volume) can be decreased at the cost of increasing…
We present a new approach to fault tolerance for High Performance Computing system. Our approach is based on a careful adaptation of the Algorithmic Based Fault Tolerance technique (Huang and Abraham, 1984) to the need of parallel…
This paper continues to develop a fault tolerant extension of the sparse grid combination technique recently proposed in [B. Harding and M. Hegland, ANZIAM J., 54 (CTAC2012), pp. C394-C411]. The approach is novel for two reasons, first it…
Modular architectures offer a scalable path toward fault-tolerant quantum computing by interconnecting smaller quantum processing units (QPUs) provided that high-rate, fault-tolerant interfaces can be realized across modules. We present a…
Today's hardware's explosion of concurrency plus the explosion of data we build upon in both machine learning and scientific simulations have multifaceted impact on how we write our codes. They have changed our notion of performance and,…
Imperfect measurement can degrade a quantum error correction scheme. A solution that restores fault tolerance is to add redundancy to the process of syndrome extraction. In this work, we show how to optimize this process for an arbitrary…
Fault-tolerant schemes can use error correction to make a quantum computation arbitrarily ac- curate, provided that errors per physical component are smaller than a certain threshold and in- dependent of the computer size. However in…
Resilient algorithms in high-performance computing are subject to rigorous non-functional constraints. Resiliency must not increase the runtime, memory footprint or I/O demands too significantly. We propose a task-based soft error detection…
In this paper we propose a generalized R redundancy cycle technique that provides optical networks almost fault-tolerant communications. More importantly, when applied using only single cycles rather than the standard paired cycles, the…
Dating back to the seminal work of von Neumann [von Neumann, Automata Studies, 1956], it is known that error correcting codes can overcome faulty circuit components to enable robust computation. Choosing an appropriate code is non-trivial…
The idle computers on a local area, campus area, or even wide area network represent a significant computational resource---one that is, however, also unreliable, heterogeneous, and opportunistic. This type of resource has been used…
A leading approach to algorithm design aims to minimize the number of operations in an algorithm's compilation. One intuitively expects that reducing the number of operations may decrease the chance of errors. This paradigm is particularly…