Related papers: ShuffleBench: A Benchmark for Large-Scale Data Shu…

SProBench: Stream Processing Benchmark for High Performance Computing Infrastructure

Recent advancements in data stream processing frameworks have improved real-time data handling, however, scalability remains a significant challenge affecting throughput and latency. While studies have explored this issue on local machines…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-04 Apurv Deepak Kulkarni , Siavash Ghiasvand

PDSP-Bench: A Benchmarking System for Parallel and Distributed Stream Processing

The paper introduces PDSP-Bench, a novel benchmarking system designed for a systematic understanding of performance of parallel stream processing in a distributed environment. Such an understanding is essential for determining how Stream…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-16 Pratyush Agnihotri , Boris Koldehofe , Roman Heinrich , Carsten Binnig , Manisha Luthra

A Comprehensive Benchmarking Analysis of Fault Recovery in Stream Processing Frameworks

Nowadays, several software systems rely on stream processing architectures to deliver scalable performance and handle large volumes of data in near real-time. Stream processing frameworks facilitate scalable computing by distributing the…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-30 Adriano Vogel , Sören Henning , Esteban Perez-Wohlfeil , Otmar Ertl , Rick Rabiser

Benchmarking scalability of stream processing frameworks deployed as microservices in the cloud

Context: The combination of distributed stream processing with microservice architectures is an emerging pattern for building data-intensive software systems. In such systems, stream processing frameworks such as Apache Flink, Apache Kafka…

Software Engineering · Computer Science 2023-11-02 Sören Henning , Wilhelm Hasselbring

Benchmarking Distributed Stream Data Processing Systems

The need for scalable and efficient stream analysis has led to the development of many open-source streaming data processing systems (SDPSs) with highly diverging capabilities and performance characteristics. While first initiatives try to…

Databases · Computer Science 2019-06-27 Jeyhun Karimov , Tilmann Rabl , Asterios Katsifodimos , Roman Samarev , Henri Heiskanen , Volker Markl

ESPBench: The Enterprise Stream Processing Benchmark

Growing data volumes and velocities in fields such as Industry 4.0 or the Internet of Things have led to the increased popularity of data stream processing systems. Enterprises can leverage these developments by enriching their core…

Performance · Computer Science 2021-03-12 Guenter Hesse , Christoph Matthies , Michael Perscheid , Matthias Uflacker , Hasso Plattner

Theodolite: Scalability Benchmarking of Distributed Stream Processing Engines in Microservice Architectures

Distributed stream processing engines are designed with a focus on scalability to process big data volumes in a continuous manner. We present the Theodolite method for benchmarking the scalability of distributed stream processing engines.…

Software Engineering · Computer Science 2021-02-12 Sören Henning , Wilhelm Hasselbring

Evaluation of Distributed Data Processing Frameworks in Hybrid Clouds

Distributed data processing frameworks (e.g., Hadoop, Spark, and Flink) are widely used to distribute data among computing nodes of a cloud. Recently, there have been increasing efforts aimed at evaluating the performance of distributed…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-01-07 Faheem Ullah , Shagun Dhingra , Xiaoyu Xia , M. Ali Babar

High-level Stream Processing: A Complementary Analysis of Fault Recovery

Parallel computing is very important to accelerate the performance of software systems. Additionally, considering that a recurring challenge is to process high data volumes continuously, stream processing emerged as a paradigm and software…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-14 Adriano Vogel , Sören Henning , Esteban Perez-Wohlfeil , Otmar Ertl , Rick Rabiser

When Should I Run My Application Benchmark?: Studying Cloud Performance Variability for the Case of Stream Processing Applications

Performance benchmarking is a common practice in software engineering, particularly when building large-scale, distributed, and data-intensive systems. While cloud environments offer several advantages for running benchmarks, it is often…

Software Engineering · Computer Science 2025-04-17 Sören Henning , Adriano Vogel , Esteban Perez-Wohlfeil , Otmar Ertl , Rick Rabiser

WfBench: Automated Generation of Scientific Workflow Benchmarks

The prevalence of scientific workflows with high computational demands calls for their execution on various distributed computing platforms, including large-scale leadership-class high-performance computing (HPC) clusters. To handle the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-02-10 Tainã Coleman , Henri Casanova , Ketan Maheshwari , Loïc Pottier , Sean R. Wilkinson , Justin Wozniak , Frédéric Suter , Mallikarjun Shankar , Rafael Ferreira da Silva

Ranking and benchmarking framework for sampling algorithms on synthetic data streams

In the fields of big data, AI, and streaming processing, we work with large amounts of data from multiple sources. Due to memory and network limitations, we process data streams on distributed systems to alleviate computational and network…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-18 József Dániel Gáspár , Martin Horváth , Győző Horváth , Zoltán Zvara

Run Time Approximation of Non-blocking Service Rates for Streaming Systems

Stream processing is a compute paradigm that promises safe and efficient parallelism. Modern big-data problems are often well suited for stream processing's throughput-oriented nature. Realization of efficient stream processing requires…

Performance · Computer Science 2015-04-14 Jonathan C. Beard , Roger D. Chamberlain

GeoFlink: A Distributed and Scalable Framework for the Real-time Processing of Spatial Streams

Apache Flink is an open-source system for scalable processing of batch and streaming data. Flink does not natively support efficient processing of spatial data streams, which is a requirement of many applications dealing with spatial data.…

Databases · Computer Science 2020-08-04 Salman Ahmed Shaikh , Komal Mariam , Hiroyuki Kitagawa , Kyoung-Sook Kim

Let's Trace It: Fine-Grained Serverless Benchmarking using Synchronous and Asynchronous Orchestrated Applications

Making serverless computing widely applicable requires detailed performance understanding. Although contemporary benchmarking approaches exist, they report only coarse results, do not apply distributed tracing, do not consider asynchronous…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-05-17 Joel Scheuner , Simon Eismann , Sacheendra Talluri , Erwin van Eyk , Cristina Abad , Philipp Leitner , Alexandru Iosup

FastFlow: Efficient Parallel Streaming Applications on Multi-core

Shared memory multiprocessors come back to popularity thanks to rapid spreading of commodity multi-core architectures. As ever, shared memory programs are fairly easy to write and quite hard to optimise; providing multi-core programmers…

Distributed, Parallel, and Cluster Computing · Computer Science 2009-09-10 Marco Aldinucci , Massimo Torquati , Massimiliano Meneghin

Technical Report: On the Usability of Hadoop MapReduce, Apache Spark & Apache Flink for Data Science

Distributed data processing platforms for cloud computing are important tools for large-scale data analytics. Apache Hadoop MapReduce has become the de facto standard in this space, though its programming interface is relatively low-level,…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-03-30 Bilal Akil , Ying Zhou , Uwe Röhm

Model-driven Scheduling for Distributed Stream Processing Systems

Distributed Stream Processing frameworks are being commonly used with the evolution of Internet of Things(IoT). These frameworks are designed to adapt to the dynamic input message rate by scaling in/out.Apache Storm, originally developed by…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-05-10 Anshu Shukla , Yogesh Simmhan

FlowSpec: Continuous Pipelined Speculative Decoding for Efficient Distributed LLM Inference

Distributed inference serves as a promising approach to enabling the inference of large language models (LLMs) at the network edge. It distributes the inference process to multiple devices to ensure that the LLMs can fit into the device…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-13 Xing Liu , Lizhuo Luo , Ming Tang , Chao Huang , Xu Chen

A quality model for evaluating and choosing a stream processing framework architecture

Today, we have to deal with many data (Big data) and we need to make decisions by choosing an architectural framework to analyze these data coming from different area. Due to this, it become problematic when we want to process these data,…

Software Engineering · Computer Science 2019-01-29 Youness Dendane , Fabio Petrillo , Hamid Mcheick , Souhail Ben Ali