Related papers: ESPBench: The Enterprise Stream Processing Benchma…
The need for scalable and efficient stream analysis has led to the development of many open-source streaming data processing systems (SDPSs) with highly diverging capabilities and performance characteristics. While first initiatives try to…
Recent advancements in data stream processing frameworks have improved real-time data handling, however, scalability remains a significant challenge affecting throughput and latency. While studies have explored this issue on local machines…
Context: The combination of distributed stream processing with microservice architectures is an emerging pattern for building data-intensive software systems. In such systems, stream processing frameworks such as Apache Flink, Apache Kafka…
Distributed stream processing frameworks help building scalable and reliable applications that perform transformations and aggregations on continuous data streams. This paper introduces ShuffleBench, a novel benchmark to evaluate the…
The paper introduces PDSP-Bench, a novel benchmarking system designed for a systematic understanding of performance of parallel stream processing in a distributed environment. Such an understanding is essential for determining how Stream…
With the demand to process ever-growing data volumes, a variety of new data stream processing frameworks have been developed. Moving an implementation from one such system to another, e.g., for performance reasons, requires adapting…
This paper presents a benchmark of stream processing throughput comparing Apache Spark Streaming (under file-, TCP socket- and Kafka-based stream integration), with a prototype P2P stream processing framework, HarmonicIO. Maximum throughput…
Apache Kafka has become a foundational platform for high throughput event streaming, enabling real time analytics, financial transaction processing, industrial telemetry, and large scale data driven systems. Despite its maturity and…
Big data processing is a hot topic in today's computer science world. There is a significant demand for analysing big data to satisfy many requirements of many industries. Emergence of the Kappa architecture created a strong requirement for…
Making serverless computing widely applicable requires detailed performance understanding. Although contemporary benchmarking approaches exist, they report only coarse results, do not apply distributed tracing, do not consider asynchronous…
The immense growth of data demands switching from traditional data processing solutions to systems, which can process a continuous stream of real time data. Various applications employ stream processing systems to provide solutions to…
Internet of Things (IoT) is a technology paradigm where millions of sensors monitor, and help inform or manage, physical, envi- ronmental and human systems in real-time. The inherent closed-loop re- sponsiveness and decision making of IoT…
Today, we have to deal with many data (Big data) and we need to make decisions by choosing an architectural framework to analyze these data coming from different area. Due to this, it become problematic when we want to process these data,…
Approximate computing aims for efficient execution of workflows where an approximate output is sufficient instead of the exact output. The idea behind approximate computing is to compute over a representative sample instead of the entire…
Nowadays, several software systems rely on stream processing architectures to deliver scalable performance and handle large volumes of data in near real-time. Stream processing frameworks facilitate scalable computing by distributing the…
As more and more devices connect to Internet of Things, unbounded streams of data will be generated, which have to be processed "on the fly" in order to trigger automated actions and deliver real-time services. Spark Streaming is a popular…
Serverless computing has emerged as a prominent paradigm, with a significant adoption rate among cloud customers. While this model offers advantages such as abstraction from the deployment and resource scheduling, it also poses limitations…
Under several emerging application scenarios, such as in smart cities, operational monitoring of large infrastructure, wearable assistance, and Internet of Things, continuous data streams must be processed under very short delays. Several…
This paper presents LMStream, which ensures bounded latency while maximizing the throughput on the GPU-enabled micro-batch streaming systems. The main ideas behind LMStream's design can be summarized as two novel mechanisms: (1) dynamic…
[Background] Nowadays, there is a massive growth of data volume and speed in many types of systems. It introduces new needs for infrastructure and applications that have to handle streams of data with low latency and high throughput.…