Related papers: Cloudprofiler: TSC-based inter-node profiling and …
Recent advancements in data stream processing frameworks have improved real-time data handling, however, scalability remains a significant challenge affecting throughput and latency. While studies have explored this issue on local machines…
Stream processing is usually done either on a tuple-by-tuple basis or in micro-batches. There are many applications where tuples over a predefined duration/window must be processed within certain deadlines. Processing such queries using…
Modern scientific instruments generate data at rates that increasingly exceed local compute capabilities and, when paired with the staging and I/O overheads of file-based transfers, also render file-based use of remote HPC resources…
Context: The combination of distributed stream processing with microservice architectures is an emerging pattern for building data-intensive software systems. In such systems, stream processing frameworks such as Apache Flink, Apache Kafka…
The need for scalable and efficient stream analysis has led to the development of many open-source streaming data processing systems (SDPSs) with highly diverging capabilities and performance characteristics. While first initiatives try to…
Stream processing is a compute paradigm that promises safe and efficient parallelism. Modern big-data problems are often well suited for stream processing's throughput-oriented nature. Realization of efficient stream processing requires…
Streaming analysis is widely used in cloud as well as edge infrastructures. In these contexts, fine-grained application performance can be based on accurate modeling of streaming operators. This is especially beneficial for computationally…
Time-evolving stream datasets exist ubiquitously in many real-world applications where their inherent hot keys often evolve over times. Nevertheless, few existing solutions can provide efficient load balance on these time-evolving datasets…
An essential part of building a data-driven organization is the ability to handle and process continuous streams of data to discover actionable insights. The explosive growth of interconnected devices and the social Web has led to a large…
Stream processing is a computing paradigm that supports real-time data processing for a wide variety of applications. At Meta, it's used across the company for various tasks such as deriving product insights, providing and improving user…
Parallel computing is very important to accelerate the performance of software systems. Additionally, considering that a recurring challenge is to process high data volumes continuously, stream processing emerged as a paradigm and software…
The pervasive availability of streaming data is driving interest in distributed Fast Data platforms for streaming applications. Such latency-sensitive applications need to respond to dynamism in the input rates and task behavior using…
Many of the existing sentiment analysis techniques are based on supervised learning, and they demand the availability of valuable training datasets to train their models. When dataset freshness is critical, the annotating of high speed…
The proliferation of sensors over the last years has generated large amounts of raw data, forming data streams that need to be processed. In many cases, cloud resources are used for such processing, exploiting their flexibility, but these…
Distributed Stream Processing frameworks are being commonly used with the evolution of Internet of Things(IoT). These frameworks are designed to adapt to the dynamic input message rate by scaling in/out.Apache Storm, originally developed by…
Triangle counting is a fundamental and widely studied problem on static graphs, and recently on temporal graphs, where edges carry information on the timings of the associated events. Streaming processing and resource efficiency are crucial…
Experiment-in-the-Loop Computing (EILC) requires support for numerous types of processing and the management of heterogeneous infrastructure over a dynamic range of scales: from the edge to the cloud and HPC, and intermediate resources.…
In recent years, with the rapid development of sensing technology and the Internet of Things (IoT), sensors play increasingly important roles in traffic control, medical monitoring, industrial production and etc. They generated high volume…
While Test-Time Scaling (TTS) offers a promising direction to enhance video generation without the surging costs of training, current test-time video generation methods based on diffusion models suffer from exorbitant candidate exploration…
High-Definition (HD) maps are essential for the safety of autonomous driving systems. While existing techniques employ camera images and onboard sensors to generate vectorized high-precision maps, they are constrained by their reliance on…