Related papers: Thrill: High-Performance Algorithmic Distributed B…
MapReduce and its variants have significantly simplified and accelerated the process of developing parallel programs. However, most MapReduce implementations focus on data-intensive tasks while many real-world tasks are compute intensive…
Non-volatile random access memory (NVRAM) offers byte-addressable persistence at speeds comparable to DRAM. However, with caches remaining volatile, automatic cache evictions can reorder updates to memory, potentially leaving persistent…
Big data processing is a hot topic in today's computer science world. There is a significant demand for analysing big data to satisfy many requirements of many industries. Emergence of the Kappa architecture created a strong requirement for…
As dataset sizes increase, data analysis tasks in high performance computing (HPC) are increasingly dependent on sophisticated dataflows and out-of-core methods for efficient system utilization. In addition, as HPC systems grow, memory…
High Performance Computing is notorious for its long and expensive software development cycle. To address this challenge, we present Bind: a "partitioned global workflow" parallel programming model for C++ applications that enables quick…
Efficient code retrieval is critical for developer productivity, yet existing benchmarks largely focus on Python and rarely stress-test robustness beyond superficial lexical cues. To address the gap, we introduce an automated pipeline for…
On the way to Exascale, programmers face the increasing challenge of having to support multiple hardware architectures from the same code base. At the same time, portability of code and performance are increasingly difficult to achieve as…
Stochastic algorithms are efficient approaches to solving machine learning and optimization problems. In this paper, we propose a general framework called Splash for parallelizing stochastic algorithms on multi-node distributed systems.…
Distributed in-memory data processing engines accelerate iterative applications by caching substantial datasets in memory rather than recomputing them in each iteration. Selecting a suitable cluster size for caching these datasets plays an…
Pipeline is a fundamental parallel programming pattern. Mainstream pipeline programming frameworks count on data abstractions to perform pipeline scheduling. This design is convenient for data-centric pipeline applications but inefficient…
The DataFlow is sub-system of the ATLAS data acquisition responsible for the reception, buffering and subsequent movement of partial and full event data to the higher level triggers: Level 2 and Event Filter. The design of the software is…
Spark is an in-memory analytics platform that targets commodity server environments today. It relies on the Hadoop Distributed File System (HDFS) to persist intermediate checkpoint states and final processing results. In Spark, immutable…
The objective of this work was to utilize BigBench [1] as a Big Data benchmark and evaluate and compare two processing engines: MapReduce [2] and Spark [3]. MapReduce is the established engine for processing data on Hadoop. Spark is a…
Since the advent of parallel algorithms in the C++17 Standard Template Library (STL), the STL has become a viable framework for creating performance-portable applications. Given multiple existing implementations of the parallel algorithms,…
Data preprocessing techniques are devoted to correct or alleviate errors in data. Discretization and feature selection are two of the most extended data preprocessing techniques. Although we can find many proposals for static Big Data…
Compressed bitmap indexes are used in systems such as Git or Oracle to accelerate queries. They represent sets and often support operations such as unions, intersections, differences, and symmetric differences. Several important systems…
Distributed data processing platforms for cloud computing are important tools for large-scale data analytics. Apache Hadoop MapReduce has become the de facto standard in this space, though its programming interface is relatively low-level,…
We present a modern C++17-compatible thread pool implementation, built from scratch with high-performance scientific computing in mind. The thread pool is implemented as a single lightweight and self-contained class, and does not have any…
The C/C++ memory model provides an interface and execution model for programmers of concurrent (shared-variable) code. It provides a range of mechanisms that abstract from underlying hardware memory models -- that govern how multicore…
The multi-resolution approximation (MRA) of Gaussian processes was recently proposed to conduct likelihood-based inference for massive spatial data sets. An advantage of the methodology is that it can be parallelized. We implemented the MRA…