Related papers: Runtime-optimized Multi-way Stream Join Operator f…
Streaming computing effectively manages large-scale streaming data in real-time, making it ideal for applications such as real-time recommendations, anomaly detection, and monitoring, all of which require immediate processing. In this…
Streaming analysis is widely used in cloud as well as edge infrastructures. In these contexts, fine-grained application performance can be based on accurate modeling of streaming operators. This is especially beneficial for computationally…
Many modern applications require real-time processing of large volumes of high-speed data. Such data processing needs can be modeled as a streaming computation. A streaming computation is specified as a dataflow graph that exposes multiple…
Stream workflow application such as online anomaly detection or online traffic monitoring, integrates multiple streaming big data applications into data analysis pipeline. This application can be highly dynamic in nature, where the data…
Data streaming relies on continuous queries to process unbounded streams of data in a real-time fashion. It is commonly demanding in computation capacity, given that the relevant applications involve very large volumes of data. Data…
Streaming data join is a critical process in the field of near-real-time data warehousing. For this purpose, an adaptive semi-stream join algorithm called CACHEJOIN (Cache Join) focusing non-uniform stream data is provided in the…
Applications running on parallel systems often need to join a streaming relation or a stored relation with data indexed in a parallel data storage system. Some applications also compute UDFs on the joined tuples. The join can be done at the…
Distributed stream processing systems rely on the dataflow model to define and execute streaming jobs, organizing computations as Directed Acyclic Graphs (DAGs) of operators. Adjusting the parallelism of these operators is crucial to…
We address the joint optimization of multiple stream joins in a scale-out architecture by tailoring prior work on multi-way stream joins to predicate-driven data partitioning schemes. We present an integer linear programming (ILP)…
Emerging applications of machine learning in numerous areas involve continuous gathering of and learning from streams of data. Real-time incorporation of streaming data into the learned models is essential for improved inference in these…
We initiate the study of graph algorithms in the streaming setting on massive distributed and parallel systems inspired by practical data processing systems. The objective is to design algorithms that can efficiently process evolving graphs…
The high-dimensionality and volume of large scale multistream data has inhibited significant research progress in developing an integrated monitoring and diagnostics (M&D) approach. This data, also categorized as big data, is becoming…
In stream processing, stream join is one of the critical sources of performance bottlenecks. The sliding-window-based stream join provides a precise result but consumes considerable computational resources. The current solutions lack…
It is crucial to provide real-time performance in many applications, such as interactive and exploratory data analysis. In these settings, users often need to view subsets of query results quickly. It is challenging to deliver such results…
For decades, the join operator over fast data streams has always drawn much attention from the database community, due to its wide spectrum of real-world applications, such as online clustering, intrusion detection, sensor data monitoring,…
Software as a service (SaaS) has recently enjoyed much attention as it makes the use of software more convenient and cost-effective. At the same time, the arising of users' expectation for high quality service such as real-time information…
The availability of large number of processing nodes in a parallel and distributed computing environment enables sophisticated real time processing over high speed data streams, as required by many emerging applications. Sliding window…
One of the most important issues in data stream processing systems is to use operator migration to handle highly variable workloads in a cost-efficient manner and adapt to the needs at any given time on demand. Operator migration is a…
There is increasing interest in using multicore processors to accelerate stream processing. For example, indexing sliding window content to enhance the performance of streaming queries is greatly improved by utilizing the computational…
In this paper, we design the first streaming algorithms for the problem of multitasking scheduling on parallel machines with shared processing. In one pass, our streaming approximation schemes can provide an approximate value of the optimal…