English
Related papers

Related papers: Efficient Time-Evolving Stream Processing at Scale

200 papers

Carefully balancing load in distributed stream processing systems has a fundamental impact on execution latency and throughput. Load balancing is challenging because real-world workloads are skewed: some tuples in the stream are associated…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-01-28 Muhammad Anis Uddin Nasir , Gianmarco De Francisci Morales , Nicolas Kourtellis , Marco Serafini

We present a novel approach for the problem of frequency estimation in data streams that is based on optimization and machine learning. Contrary to state-of-the-art streaming frequency estimation algorithms, which heavily rely on random…

Data Structures and Algorithms · Computer Science 2022-07-19 Dimitris Bertsimas , Vassilis Digalakis

Key-based workload partitioning is a common strategy used in parallel stream processing engines, enabling effective key-value tuple distribution over worker threads in a logical operator. While randomized hashing on the keys is capable of…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-12-14 Junhua Fang , Rong Zhang , Tom Z. J. Fu , Zhenjie Zhang , Aoying Zhou , Junhua Zhu

The exponential growth of data storage demands has necessitated the evolution of hierarchical storage management strategies [1]. This study explores the application of streaming machine learning [3] to revolutionize data prefetching within…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-30 Chiyu Cheng , Chang Zhou , Yang Zhao , Jin Cao

This paper introduces a scheme for data stream processing which is robust to batch duration. Streaming frameworks process streams in batches retrieved at fixed time intervals. In a common setting a pattern recognition algorithm is applied…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-02-20 David Tolpin

The pervasive availability of streaming data is driving interest in distributed Fast Data platforms for streaming applications. Such latency-sensitive applications need to respond to dynamism in the input rates and task behavior using…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-05-13 Anshu Shukla , Yogesh Simmhan

When processing data streams with highly skewed and nonstationary key distributions, we often observe overloaded partitions when the hash partitioning fails to balance data correctly. To avoid slow tasks that delay the completion of the…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-01 Zoltán Zvara , Péter G. N. Szabó , Balázs Barnabás Lóránt , András A. Benczúr

The pervasive availability of streaming data is driving interest in distributed Fast Data platforms for streaming applications. Such latency-sensitive applications need to respond to dynamism in the input rates and task behavior using…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-03-28 Nanjangud C. Narendra , Sambit Nayak , Anshu Shukla

Whilst computational resources at the cloud edge can be leveraged to improve latency and reduce the costs of cloud services for a wide variety mobile, web, and IoT applications; such resources are naturally constrained. For distributed…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-12-20 Ben Blamey , Ida-Maria Sintorn , Andreas Hellander , Salman Toor

To conduct real-time analytics computations, big data stream processing engines are required to process unbounded data streams at millions of events per second. However, current streaming engines exhibit low throughput and high tuple…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-11 Shinhyung Yang , Jiun Jeong , Bernhard Scholz , Bernd Burgstaller

With the rapid growth in the number of devices of the Internet of Things (IoT), the volume and types of stream data are rapidly increasing in the real world. Unfortunately, the stream data has the characteristics of infinite and periodic…

Performance · Computer Science 2022-12-13 Weirong Xiu , Baozhu Li , Xusheng Du , Zheng Chu

Hospitals around the world collect massive amounts of physiological data from their patients every day. Recently, there has been an increase in research interest to subject this data to statistical analysis to gain more insights and provide…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-04 Anand Jayarajan , Kimberly Hau , Andrew Goodwin , Gennady Pekhimenko

Under several emerging application scenarios, such as in smart cities, operational monitoring of large infrastructure, wearable assistance, and Internet of Things, continuous data streams must be processed under very short delays. Several…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-12-05 Marcos Dias de Assuncao , Alexandre da Silva Veith , Rajkumar Buyya

Real-time processing of data streams emanating from sensors is becoming a common task in Internet of Things scenarios. The key implementation goal consists in efficiently handling massive incoming data streams and supporting advanced data…

Databases · Computer Science 2017-05-17 Xiangnan Ren , Olivier Curé

Ubiquitous sensors today emit high frequency streams of numerical measurements that reflect properties of human, animal, industrial, commercial, and natural processes. Shifts in such processes, e.g. caused by external events or internal…

Machine Learning · Computer Science 2025-04-04 Arik Ermshaus , Patrick Schäfer , Ulf Leser

Many distributed machine learning frameworks have recently been built to speed up the large-scale data learning process. However, most distributed machine learning used in these frameworks still uses an offline algorithm model which cannot…

Artificial Intelligence · Computer Science 2018-07-19 Mahardhika Pratama , Choiru Za'in , Eric Pardede

Many applications process a stream of tuples over a window duration, and require the results within a specified deadline after the end of the window. For such scenarios, processing tuples intermittently (in batches) instead of eagerly…

Databases · Computer Science 2026-05-19 Saranya Chandrasekaran , S. Sudarshan

State-of-the-art distributed stream processing systems such as Apache Flink and Storm have recently included checkpointing to provide fault-tolerance for stateful applications. This is a necessary eventuality as these systems head into the…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-21 Sachini Jayasekara , Aaron Harwood , Shanika Karunasekera

An essential part of building a data-driven organization is the ability to handle and process continuous streams of data to discover actionable insights. The explosive growth of interconnected devices and the social Web has led to a large…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-07-23 Haruna Isah , Farhana Zulkernine

Often, machine learning applications have to cope with dynamic environments where data are collected in the form of continuous data streams with potentially infinite length and transient behavior. Compared to traditional (batch) data…

Machine Learning · Computer Science 2021-12-21 Guilherme Cassales , Heitor Gomes , Albert Bifet , Bernhard Pfahringer , Hermes Senger
‹ Prev 1 2 3 10 Next ›