English

MODC: Resilience for disaggregated memory architectures using task-based programming

Distributed, Parallel, and Cluster Computing 2021-09-14 v1

Abstract

Disaggregated memory architectures provide benefits to applications beyond traditional scale out environments, such as independent scaling of compute and memory resources. They also provide an independent failure model, where computations or the compute nodes they run on may fail independently of the disaggregated memory; thus, data that's resident in the disaggregated memory is unaffected by the compute failure. Blind application of traditional techniques for resilience (e.g., checkpoints or data replication) does not take advantage of these architectures. To demonstrate the potential benefit of these architectures for resilience, we develop Memory-Oriented Distributed Computing (MODC), a framework for programming disaggregated architectures that borrows and adapts ideas from task-based programming models, concurrent programming techniques, and lock-free data structures. This framework includes a task-based application programming model and a runtime system that provides scheduling, coordination, and fault tolerance mechanisms. We present highlights of our MODC prototype and experimental results demonstrating that MODC-style resilience outperforms a checkpoint-based approach in the face of failures.

Keywords

Cite

@article{arxiv.2109.05329,
  title  = {MODC: Resilience for disaggregated memory architectures using task-based programming},
  author = {Kimberly Keeton and Sharad Singhal and Haris Volos and Yupu Zhang and Ramesh Chandra Chaurasiya and Clarete Riana Crasta and Sherin T George and Nagaraju K N and Mashood Abdulla K and Kavitha Natarajan and Porno Shome and Sanish Suresh},
  journal= {arXiv preprint arXiv:2109.05329},
  year   = {2021}
}

Comments

9 pages, 4 figures