Related papers: Adaptive Normalization in Streaming Data
The proliferation of GPS-enabled devices has led to the development of numerous location-based services. These services need to process massive amounts of spatial data in real-time. The current scale of spatial data cannot be handled using…
Streaming process mining deals with the real-time analysis of event streams. A common approach for it is to adopt windowing mechanisms that select event data from a stream for subsequent analysis. However, the size of these windows denotes…
This paper introduces a scheme for data stream processing which is robust to batch duration. Streaming frameworks process streams in batches retrieved at fixed time intervals. In a common setting a pattern recognition algorithm is applied…
Nowadays, every device connected to the Internet generates an ever-growing stream of data (formally, unbounded). Machine Learning on unbounded data streams is a grand challenge due to its resource constraints. In fact, standard machine…
Emerging applications of machine learning in numerous areas involve continuous gathering of and learning from streams of data. Real-time incorporation of streaming data into the learned models is essential for improved inference in these…
The pervasive availability of streaming data is driving interest in distributed Fast Data platforms for streaming applications. Such latency-sensitive applications need to respond to dynamism in the input rates and task behavior using…
The past decade has witnessed many interesting algorithms for maintaining statistics over a data stream. This paper initiates a theoretical study of algorithms for monitoring distributed data streams over a time-based sliding window (which…
Data normalization is an essential task when modeling a classification system. When dealing with data streams, data normalization becomes especially challenging since we may not know in advance the properties of the features, such as their…
Adaptive sampling is a useful algorithmic tool for data summarization problems in the classical centralized setting, where the entire dataset is available to the single processor performing the computation. Adaptive sampling repeatedly…
Software as a service (SaaS) has recently enjoyed much attention as it makes the use of software more convenient and cost-effective. At the same time, the arising of users' expectation for high quality service such as real-time information…
To ensure reliability and service availability, next-generation networks are expected to rely on automated anomaly detection systems powered by advanced machine learning methods with the capability of handling multi-dimensional data. Such…
Deep neural networks are typically trained by uniformly sampling large datasets across epochs, despite evidence that not all samples contribute equally throughout learning. Recent work shows that progressively reducing the amount of…
Data communication in cloud-based distributed stream data analytics often involves a collection of parallel and pipelined TCP flows. As the standard TCP congestion control mechanism is designed for achieving "fairness" among competing flows…
Data sharding, a technique for partitioning and distributing data among multiple servers or nodes, offers enhancements in the scalability, performance, and fault tolerance of extensive distributed systems. Nonetheless, this strategy…
Fog computing extends the cloud computing paradigm by allocating substantial portions of computations and services towards the edge of a network, and is, therefore, particularly suitable for large-scale, geo-distributed, and data-intensive…
Distributed stream processing systems rely on the dataflow model to define and execute streaming jobs, organizing computations as Directed Acyclic Graphs (DAGs) of operators. Adjusting the parallelism of these operators is crucial to…
Ever-increasing amounts of data and requirements to process them in real time lead to more and more analytics platforms and software systems being designed according to the concept of stream processing. A common area of application is the…
The detection of anomalies in real time is paramount to maintain performance and efficiency across a wide range of applications including web services and smart manufacturing. This paper presents a novel algorithm to detect anomalies in…
Big data trend has enforced the data-centric systems to have continuous fast data streams. In recent years, real-time analytics on stream data has formed into a new research field, which aims to answer queries about what-is-happening-now…
In this article, motivated by biosurveillance and censoring sensor networks, we investigate the problem of distributed monitoring large-scale data streams where an undesired event may occur at some unknown time and affect only a few unknown…