English
Related papers

Related papers: Benchmarking Distributed Stream Data Processing Sy…

200 papers

The paper introduces PDSP-Bench, a novel benchmarking system designed for a systematic understanding of performance of parallel stream processing in a distributed environment. Such an understanding is essential for determining how Stream…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-16 Pratyush Agnihotri , Boris Koldehofe , Roman Heinrich , Carsten Binnig , Manisha Luthra

Internet of Things (IoT) is a technology paradigm where millions of sensors monitor, and help inform or manage, physical, envi- ronmental and human systems in real-time. The inherent closed-loop re- sponsiveness and decision making of IoT…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-05-13 Anshu Shukla , Yogesh Simmhan

Growing data volumes and velocities in fields such as Industry 4.0 or the Internet of Things have led to the increased popularity of data stream processing systems. Enterprises can leverage these developments by enriching their core…

Performance · Computer Science 2021-03-12 Guenter Hesse , Christoph Matthies , Michael Perscheid , Matthias Uflacker , Hasso Plattner

Context: The combination of distributed stream processing with microservice architectures is an emerging pattern for building data-intensive software systems. In such systems, stream processing frameworks such as Apache Flink, Apache Kafka…

Software Engineering · Computer Science 2023-11-02 Sören Henning , Wilhelm Hasselbring

Distributed stream processing frameworks help building scalable and reliable applications that perform transformations and aggregations on continuous data streams. This paper introduces ShuffleBench, a novel benchmark to evaluate the…

Software Engineering · Computer Science 2024-03-08 Sören Henning , Adriano Vogel , Michael Leichtfried , Otmar Ertl , Rick Rabiser

Distributed stream processing engines are designed with a focus on scalability to process big data volumes in a continuous manner. We present the Theodolite method for benchmarking the scalability of distributed stream processing engines.…

Software Engineering · Computer Science 2021-02-12 Sören Henning , Wilhelm Hasselbring

Distributed Stream Processing frameworks are being commonly used with the evolution of Internet of Things(IoT). These frameworks are designed to adapt to the dynamic input message rate by scaling in/out.Apache Storm, originally developed by…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-05-10 Anshu Shukla , Yogesh Simmhan

Recent advancements in data stream processing frameworks have improved real-time data handling, however, scalability remains a significant challenge affecting throughput and latency. While studies have explored this issue on local machines…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-04 Apurv Deepak Kulkarni , Siavash Ghiasvand

Nowadays, several software systems rely on stream processing architectures to deliver scalable performance and handle large volumes of data in near real-time. Stream processing frameworks facilitate scalable computing by distributing the…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-30 Adriano Vogel , Sören Henning , Esteban Perez-Wohlfeil , Otmar Ertl , Rick Rabiser

Distributed Stream Processing Systems (DSPS) like Apache Storm and Spark Streaming enable composition of continuous dataflows that execute persistently over data streams. They are used by Internet of Things (IoT) applications to analyze…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-05-10 Shilpa Chaturvedi , Sahil Tyagi , Yogesh Simmhan

This paper proposes a learned cost estimation model for Distributed Stream Processing Systems (DSPS) with an aim to provide accurate cost predictions of executing queries. A major premise of this work is that the proposed learned model can…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-07-11 Roman Heinrich , Manisha Luthra , Harald Kornmayer , Carsten Binnig

This paper presents a benchmark of stream processing throughput comparing Apache Spark Streaming (under file-, TCP socket- and Kafka-based stream integration), with a prototype P2P stream processing framework, HarmonicIO. Maximum throughput…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-12-20 Ben Blamey , Andreas Hellander , Salman Toor

Distributed Stream Processing Systems (DSPSs) form the backbone of real-time processing and analytics at ByteDance, where Apache Flink powers one of the largest production clusters worldwide. Ensuring resiliency, the ability to withstand…

Databases · Computer Science 2026-02-04 Yong Fang , Yuxing Han , Meng Wang , Yifan Zhang , Yue Ma , Chi Zhang

The Internet of Things (IoT) is an emerging technology paradigm where millions of sensors and actuators help monitor and manage, physical, environmental and human systems in real-time. The inherent closedloop responsiveness and decision…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-05-10 Anshu Shukla , Shilpa Chaturvedi , Yogesh Simmhan

Data communication in cloud-based distributed stream data analytics often involves a collection of parallel and pipelined TCP flows. As the standard TCP congestion control mechanism is designed for achieving "fairness" among competing flows…

Networking and Internet Architecture · Computer Science 2019-08-08 Walid Aljoby , Xin Wang , Tom Z. J. Fu , Richard T. B. Ma

With the demand to process ever-growing data volumes, a variety of new data stream processing frameworks have been developed. Moving an implementation from one such system to another, e.g., for performance reasons, requires adapting…

Performance · Computer Science 2019-07-22 Guenter Hesse , Christoph Matthies , Kelvin Glass , Johannes Huegle , Matthias Uflacker

This paper presents LMStream, which ensures bounded latency while maximizing the throughput on the GPU-enabled micro-batch streaming systems. The main ideas behind LMStream's design can be summarized as two novel mechanisms: (1) dynamic…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-11-09 Suyeon Lee , Sungyong Park

As dataset sizes increase, data analysis tasks in high performance computing (HPC) are increasingly dependent on sophisticated dataflows and out-of-core methods for efficient system utilization. In addition, as HPC systems grow, memory…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-01 George K. Thiruvathukal , Cameron Christensen , Xiaoyong Jin , François Tessier , Venkatram Vishwanath

Distributed Stream Processing Systems (DSPSs) are among the currently most emerging topics in data management, with applications ranging from real-time event monitoring to processing complex dataflow programs and big data analytics. The…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-06 Vinu E. Venugopal , Martin Theobald , Samira Chaychi , Amal Tawakuli

Apache Kafka has become a foundational platform for high throughput event streaming, enabling real time analytics, financial transaction processing, industrial telemetry, and large scale data driven systems. Despite its maturity and…

Software Engineering · Computer Science 2026-02-03 Muzeeb Mohammad
‹ Prev 1 2 3 10 Next ›