English

Leveraging Apache Arrow for Zero-copy, Zero-serialization Cluster Shared Memory

Emerging Technologies 2024-04-05 v1

Abstract

This paper describes a distributed implementation of Apache Arrow that can leverage cluster-shared load-store addressable memory that is hardware-coherent only within each node. The implementation is built on the ThymesisFlow prototype that leverages the OpenCAPI interface to create a shared address space across a cluster. While Apache Arrow structures are immutable, simplifying their use in a cluster shared memory, this paper creates distributed Apache Arrow tables and makes them accessible in each node.

Cite

@article{arxiv.2404.03030,
  title  = {Leveraging Apache Arrow for Zero-copy, Zero-serialization Cluster Shared Memory},
  author = {Philip Groet and Joost Hoozemans and Andreas Grapentin and Felix Eberhardt and Zaid Al-Ars and H. Peter Hofstee},
  journal= {arXiv preprint arXiv:2404.03030},
  year   = {2024}
}

Comments

Presented at the 3rd Workshop on Heterogeneous Composable and Disaggregated Systems (HCDS 2024)

R2 v1 2026-06-28T15:43:28.675Z