Related papers: Leveraging Apache Arrow for Zero-copy, Zero-serial…

Memory-Disaggregated In-Memory Object Store Framework for Big Data Applications

The concept of memory disaggregation has recently been gaining traction in research. With memory disaggregation, data center compute nodes can directly access memory on adjacent nodes and are therefore able to overcome local memory…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-04-28 Robin Abrahamse , Akos Hadnagy , Zaid Al-Ars

Zero-Cost, Arrow-Enabled Data Interface for Apache Spark

Distributed data processing ecosystems are widespread and their components are highly specialized, such that efficient interoperability is urgent. Recently, Apache Arrow was chosen by the community to serve as a format mediator, providing…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-11-30 Sebastiaan Alvarez Rodriguez , Jayjeet Chakraborty , Aaron Chu , Ivo Jimenez , Jeff LeFevre , Carlos Maltzahn , Alexandru Uta

Towards an Arrow-native Storage System

With the ever-increasing dataset sizes, several file formats like Parquet, ORC, and Avro have been developed to store data efficiently and to save network and interconnect bandwidth at the price of additional CPU utilization. However, with…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-24 Jayjeet Chakraborty , Ivo Jimenez , Sebastiaan Alvarez Rodriguez , Alexandru Uta , Jeff LeFevre , Carlos Maltzahn

Zerrow: True Zero-Copy Arrow Pipelines in Bauplan

Bauplan is a FaaS-based lakehouse specifically built for data pipelines: its execution engine uses Apache Arrow for data passing between the nodes in the DAG. While Arrow is known as the "zero copy format", in practice, limited Linux kernel…

Operating Systems · Computer Science 2025-05-15 Yifan Dai , Jacopo Tagliabue , Andrea Arpaci-Dusseau , Remzi Arpaci-Dusseau , Tyler R. Caraza-Harter

Benchmarking Apache Arrow Flight -- A wire-speed protocol for data transfer, querying and microservices

Moving structured data between different big data frameworks and/or data warehouses/storage systems often cause significant overhead. Most of the time more than 80\% of the total time spent in accessing data is elapsed in…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-04-11 Tanveer Ahmad , Zaid Al Ars , H. Peter Hofstee

A PGAS Communication Library for Heterogeneous Clusters

This work presents a heterogeneous communication library for clusters of processors and FPGAs. This library, Shoal, supports the Partitioned Global Address Space (PGAS) memory model for applications. PGAS is a shared memory model for…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-04-27 Varun Sharma , Paul Chow

Selecting Efficient Cluster Resources for Data Analytics: When and How to Allocate for In-Memory Processing?

Distributed dataflow systems such as Apache Spark or Apache Flink enable parallel, in-memory data processing on large clusters of commodity hardware. Consequently, the appropriate amount of memory to allocate to the cluster is a crucial…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-08 Jonathan Will , Lauritz Thamsen , Dominik Scheinert , Odej Kao

AAFLOW: Scalable Patterns for Agentic AI Workflows

Agentic workflows in large language model systems integrate retrieval, reasoning, and memory, but existing frameworks suffer from scalability and reproducibility limitations due to fragmented data orchestration, serialization overhead, and…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-05 Arup Kumar Sarker , Mills Staylor , Aymen Alsaadi , Gregor von Laszewski , Shantenu Jha , Geoffrey Fox

Arrow: Adaptive Scheduling Mechanisms for Disaggregated LLM Inference Architecture

Existing large language model (LLM) serving systems typically employ Prefill-Decode disaggregated architecture to prevent computational interference between the prefill and decode phases. However, in real-world LLM serving scenarios,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-07 Yu Wu , Tongxuan Liu , Yuting Zeng , Siyu Wu , Jun Xiong , Xianzhe Dong , Hailong Yang , Ke Zhang , Jing Li

Archer: A Community Distributed Computing Infrastructure for Computer Architecture Research and Education

This paper introduces Archer, a community-based computing resource for computer architecture research and education. The Archer infrastructure integrates virtualization and batch scheduling middleware to deliver high-throughput computing…

Hardware Architecture · Computer Science 2015-05-13 Renato Figueiredo , P. Oscar Boykin , Jose A. B. Fortes , Tao Li , Jie-Kwon Peir , David Wolinsky , Lizy John , David Kaeli , David Lilja , Sally McKee , Gokhan Memik , Alain Roy , Gary Tyson

HDArray: Parallel Array Interface for Distributed Heterogeneous Devices

Heterogeneous clusters with nodes containing one or more accelerators, such as GPUs, have become common. While MPI provides inter-address space communication, and OpenCL provides a process with access to heterogeneous computational…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-09-19 Hyun Dok Cho , Okwan Kwon , Samuel P. Midkiff

ArcaDB: A Container-based Disaggregated Query Engine for Heterogenous Computational Environments

Modern enterprises rely on data management systems to collect, store, and analyze vast amounts of data related with their operations. Nowadays, clusters and hardware accelerators (e.g., GPUs, TPUs) have become a necessity to scale with the…

Databases · Computer Science 2023-11-28 Kristalys Ruiz-Rohena , Manuel Rodriguez-Martinez

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-03-17 Martín Abadi , Ashish Agarwal , Paul Barham , Eugene Brevdo , Zhifeng Chen , Craig Citro , Greg S. Corrado , Andy Davis , Jeffrey Dean , Matthieu Devin , Sanjay Ghemawat , Ian Goodfellow , Andrew Harp , Geoffrey Irving , Michael Isard , Yangqing Jia , Rafal Jozefowicz , Lukasz Kaiser , Manjunath Kudlur , Josh Levenberg , Dan Mane , Rajat Monga , Sherry Moore , Derek Murray , Chris Olah , Mike Schuster , Jonathon Shlens , Benoit Steiner , Ilya Sutskever , Kunal Talwar , Paul Tucker , Vincent Vanhoucke , Vijay Vasudevan , Fernanda Viegas , Oriol Vinyals , Pete Warden , Martin Wattenberg , Martin Wicke , Yuan Yu , Xiaoqiang Zheng

AARC: Automated Affinity-aware Resource Configuration for Serverless Workflows

Serverless computing is increasingly adopted for its ability to manage complex, event-driven workloads without the need for infrastructure provisioning. However, traditional resource allocation in serverless platforms couples CPU and…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-03 Lingxiao Jin , Zinuo Cai , Zebin Chen , Hongyu Zhao , Ruhui Ma

Cache Coherence Over Disaggregated Memory

Disaggregating memory from compute offers the opportunity to better utilize stranded memory in cloud data centers. It is important to cache data in the compute nodes and maintain cache coherence across multiple compute nodes. However, the…

Databases · Computer Science 2026-01-14 Ruihong Wang , Jianguo Wang , Walid G. Aref

A Non-blocking Buddy System for Scalable Memory Allocation on Multi-core Machines

Common implementations of core memory allocation components, like the Linux buddy system, handle concurrent allocation/release requests by synchronizing threads via spin-locks. This approach is clearly not prone to scale with large thread…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-05-22 Romolo Marotta , Mauro Ianni , Alessandro Pellegrini , Andrea Scarselli , Francesco Quaglia

Arrows for Parallel Computation

Arrows are a general interface for computation and an alternative to Monads for API design. In contrast to Monad-based parallelism, we explore the use of Arrows for specifying generalised parallelism. Specifically, we define an Arrow-based…

Programming Languages · Computer Science 2018-01-09 Martin Braun , Oleg Lobachev , Phil Trinder

Hardware locality-aware partitioning and dynamic load-balancing of unstructured meshes for large-scale scientific applications

We present an open-source topology-aware hierarchical unstructured mesh partitioning and load-balancing tool TreePart. The framework provides powerful abstractions to automatically detect and build hierarchical MPI topology resembling the…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-08-04 Pavanakumar Mohanamuraly , Gabriel Staffelbach

PULSE: Accelerating Distributed Pointer-Traversals on Disaggregated Memory (Extended Version)

Caches at CPU nodes in disaggregated memory architectures amortize the high data access latency over the network. However, such caches are fundamentally unable to improve performance for workloads requiring pointer traversals across linked…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-12-17 Yupeng Tang , Seung-seob Lee , Abhishek Bhattacharjee , Anurag Khandelwal

Skyhook: Towards an Arrow-Native Storage System

With the ever-increasing dataset sizes, several file formats such as Parquet, ORC, and Avro have been developed to store data efficiently, save the network, and interconnect bandwidth at the price of additional CPU utilization. However,…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-04-14 Jayjeet Chakraborty , Ivo Jimenez , Sebastiaan Alvarez Rodriguez , Alexandru Uta , Jeff LeFevre , Carlos Maltzahn