English

PaSh: Light-touch Data-Parallel Shell Processing

Distributed, Parallel, and Cluster Computing 2021-04-06 v4 Programming Languages

Abstract

This paper presents {\scshape PaSh}, a system for parallelizing POSIX shell scripts. Given a script, {\scshape PaSh} converts it to a dataflow graph, performs a series of semantics-preserving program transformations that expose parallelism, and then converts the dataflow graph back into a script -- one that adds POSIX constructs to explicitly guide parallelism coupled with {\scshape PaSh}-provided {\scshape Unix}-aware runtime primitives for addressing performance- and correctness-related issues. A lightweight annotation language allows command developers to express key parallelizability properties about their commands. An accompanying parallelizability study of POSIX and GNU commands -- two large and commonly used groups -- guides the annotation language and optimized aggregator library that {\scshape PaSh} uses. Finally, {\scshape PaSh}'s {\scshape PaSh}'s extensive evaluation over 44 unmodified {\scshape Unix} scripts shows significant speedups (0.890.89--61.1×61.1\times, avg: 6.7×6.7\times) stemming from the combination of its program transformations and runtime primitives.

Keywords

Cite

@article{arxiv.2007.09436,
  title  = {PaSh: Light-touch Data-Parallel Shell Processing},
  author = {Nikos Vasilakis and Konstantinos Kallas and Konstantinos Mamouras and Achilleas Benetopoulos and Lazar Cvetković},
  journal= {arXiv preprint arXiv:2007.09436},
  year   = {2021}
}

Comments

18 pages, 12 figures

R2 v1 2026-06-23T17:13:01.065Z