Related papers: SProBench: Stream Processing Benchmark for High Pe…

ShuffleBench: A Benchmark for Large-Scale Data Shuffling Operations with Distributed Stream Processing Frameworks

Distributed stream processing frameworks help building scalable and reliable applications that perform transformations and aggregations on continuous data streams. This paper introduces ShuffleBench, a novel benchmark to evaluate the…

Software Engineering · Computer Science 2024-03-08 Sören Henning , Adriano Vogel , Michael Leichtfried , Otmar Ertl , Rick Rabiser

Benchmarking scalability of stream processing frameworks deployed as microservices in the cloud

Context: The combination of distributed stream processing with microservice architectures is an emerging pattern for building data-intensive software systems. In such systems, stream processing frameworks such as Apache Flink, Apache Kafka…

Software Engineering · Computer Science 2023-11-02 Sören Henning , Wilhelm Hasselbring

ESPBench: The Enterprise Stream Processing Benchmark

Growing data volumes and velocities in fields such as Industry 4.0 or the Internet of Things have led to the increased popularity of data stream processing systems. Enterprises can leverage these developments by enriching their core…

Performance · Computer Science 2021-03-12 Guenter Hesse , Christoph Matthies , Michael Perscheid , Matthias Uflacker , Hasso Plattner

PDSP-Bench: A Benchmarking System for Parallel and Distributed Stream Processing

The paper introduces PDSP-Bench, a novel benchmarking system designed for a systematic understanding of performance of parallel stream processing in a distributed environment. Such an understanding is essential for determining how Stream…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-16 Pratyush Agnihotri , Boris Koldehofe , Roman Heinrich , Carsten Binnig , Manisha Luthra

Benchmarking Distributed Stream Data Processing Systems

The need for scalable and efficient stream analysis has led to the development of many open-source streaming data processing systems (SDPSs) with highly diverging capabilities and performance characteristics. While first initiatives try to…

Databases · Computer Science 2019-06-27 Jeyhun Karimov , Tilmann Rabl , Asterios Katsifodimos , Roman Samarev , Henri Heiskanen , Volker Markl

High-level Stream Processing: A Complementary Analysis of Fault Recovery

Parallel computing is very important to accelerate the performance of software systems. Additionally, considering that a recurring challenge is to process high data volumes continuously, stream processing emerged as a paradigm and software…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-14 Adriano Vogel , Sören Henning , Esteban Perez-Wohlfeil , Otmar Ertl , Rick Rabiser

PageRank Pipeline Benchmark: Proposal for a Holistic System Benchmark for Big-Data Platforms

The rise of big data systems has created a need for benchmarks to measure and compare the capabilities of these systems. Big data benchmarks present unique scalability challenges. The supercomputing community has wrestled with these…

Performance · Computer Science 2016-12-13 Patrick Dreher , Chansup Byun , Chris Hill , Vijay Gadepally , Bradley Kuszmaul , Jeremy Kepner

A Comprehensive Benchmarking Analysis of Fault Recovery in Stream Processing Frameworks

Nowadays, several software systems rely on stream processing architectures to deliver scalable performance and handle large volumes of data in near real-time. Stream processing frameworks facilitate scalable computing by distributing the…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-30 Adriano Vogel , Sören Henning , Esteban Perez-Wohlfeil , Otmar Ertl , Rick Rabiser

First Experiences in Performance Benchmarking with the New SPEChpc 2021 Suites

Modern HPC systems are built with innovative system architectures and novel programming models to further push the speed limit of computing. The increased complexity poses challenges for performance portability and performance evaluation.…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-03-29 Holger Brunst , Sunita Chandrasekaran , Florina Ciorba , Nick Hagerty , Robert Henschel , Guido Juckeland , Junjie Li , Veronica G. Melesse Vergara , Sandra Wienke , Miguel Zavala

When Should I Run My Application Benchmark?: Studying Cloud Performance Variability for the Case of Stream Processing Applications

Performance benchmarking is a common practice in software engineering, particularly when building large-scale, distributed, and data-intensive systems. While cloud environments offer several advantages for running benchmarks, it is often…

Software Engineering · Computer Science 2025-04-17 Sören Henning , Adriano Vogel , Esteban Perez-Wohlfeil , Otmar Ertl , Rick Rabiser

To Stream or Not to Stream: Towards A Quantitative Model for Remote HPC Processing Decisions

Modern scientific instruments generate data at rates that increasingly exceed local compute capabilities and, when paired with the staging and I/O overheads of file-based transfers, also render file-based use of remote HPC resources…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-01 Flavio Castro , Weijian Zheng , Joaquin Chung , Ian Foster , Rajkumar Kettimuthu

Continuous evaluation of the performance of cloud infrastructure for scientific applications

Cloud computing recently developed into a viable alternative to on-premises systems for executing high-performance computing (HPC) applications. With the emergence of new vendors and hardware options, there is now a growing need to…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-12-14 Mohammad Mohammadi , Timur Bazhirov

Apache Spark Streaming, Kafka and HarmonicIO: A Performance Benchmark and Architecture Comparison for Enterprise and Scientific Computing

This paper presents a benchmark of stream processing throughput comparing Apache Spark Streaming (under file-, TCP socket- and Kafka-based stream integration), with a prototype P2P stream processing framework, HarmonicIO. Maximum throughput…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-12-20 Ben Blamey , Andreas Hellander , Salman Toor

Performance Characterization and Modeling of Serverless and HPC Streaming Applications

Experiment-in-the-Loop Computing (EILC) requires support for numerous types of processing and the management of heterogeneous infrastructure over a dynamic range of scales: from the edge to the cloud and HPC, and intermediate resources.…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-09-16 Andre Luckow , Shantenu Jha

Cloudprofiler: TSC-based inter-node profiling and high-throughput data ingestion for cloud streaming workloads

To conduct real-time analytics computations, big data stream processing engines are required to process unbounded data streams at millions of events per second. However, current streaming engines exhibit low throughput and high tuple…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-11 Shinhyung Yang , Jiun Jeong , Bernhard Scholz , Bernd Burgstaller

Let's Trace It: Fine-Grained Serverless Benchmarking using Synchronous and Asynchronous Orchestrated Applications

Making serverless computing widely applicable requires detailed performance understanding. Although contemporary benchmarking approaches exist, they report only coarse results, do not apply distributed tracing, do not consider asynchronous…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-05-17 Joel Scheuner , Simon Eismann , Sacheendra Talluri , Erwin van Eyk , Cristina Abad , Philipp Leitner , Alexandru Iosup

AdaptMemBench: Application-Specific MemorySubsystem Benchmarking

Optimizing scientific applications to take full advan-tage of modern memory subsystems is a continual challenge forapplication and compiler developers. Factors beyond working setsize affect performance. A benchmark framework that…

Performance · Computer Science 2018-12-20 Mahesh Lakshminarasimhan , Catherine Olschanowsky

Designing Co-operation in Systems of Hierarchical, Multi-objective Schedulers for Stream Processing

Stream processing is a computing paradigm that supports real-time data processing for a wide variety of applications. At Meta, it's used across the company for various tasks such as deriving product insights, providing and improving user…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-09 Animesh Dangwal , Yufeng Jiang , Charlie Arnold , Jun Fan , Mohamed Bassem , Aish Rajagopal

Scalable Systems and Software Architectures for High-Performance Computing on cloud platforms

High-performance computing (HPC) is essential for tackling complex computational problems across various domains. As the scale and complexity of HPC applications continue to grow, the need for scalable systems and software architectures…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-08-21 Risshab Srinivas Ramesh

Benchmark Framework with Skewed Workloads

In this work, we present a new benchmarking suite with new real-life inspired skewed workloads to test the performance of concurrent index data structures. We started this project to prepare workloads specifically for self-adjusting data…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-19 Vitaly Aksenov , Dmitry Ivanov , Ravil Galiev