Related papers: HyperStream: a Workflow Engine for Streaming Data
Hospitals around the world collect massive amounts of physiological data from their patients every day. Recently, there has been an increase in research interest to subject this data to statistical analysis to gain more insights and provide…
River is a machine learning library for dynamic data streams and continual learning. It provides multiple state-of-the-art learning methods, data generators/transformers, performance metrics and evaluators for different stream learning…
This paper introduces H-STREAM, a big stream/data processing pipelines evaluation engine that proposes stream processing operators as micro-services to support the analysis and visualisation of Big Data streams stemming from IoT (Internet…
Reusable data/code and reproducible analyses are foundational to quality research. This aspect, however, is often overlooked when designing interactive stream analysis workflows for time-series data (e.g., eye-tracking data). A mechanism to…
This is a position paper, submitted to the Future Online Analysis Platform Workshop (https://press3.mcs.anl.gov/futureplatform/), which argues that simple data analysis applications are common today, but future online supercomputing…
Workflows are among the most commonly used tools in a variety of execution environments. Many of them target a specific environment; few of them make it possible to execute an entire workflow in different environments, e.g. Kubernetes and…
Big data streaming applications require utilization of heterogeneous parallel computing systems, which may comprise multiple multi-core CPUs and many-core accelerating devices such as NVIDIA GPUs and Intel Xeon Phis. Programming such…
Scikit-multiflow is a multi-output/multi-label and stream data mining framework for the Python programming language. Conceived to serve as a platform to encourage democratization of stream learning research, it provides multiple state of…
TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous…
We present DataFlow, a computational framework for building, testing, and deploying high-performance machine learning systems on unbounded time-series data. Traditional data science workflows assume finite datasets and require substantial…
As more and more devices connect to Internet of Things, unbounded streams of data will be generated, which have to be processed "on the fly" in order to trigger automated actions and deliver real-time services. Spark Streaming is a popular…
An increasing number of scientific applications rely on stream processing for generating timely insights from data feeds of scientific instruments, simulations, and Internet-of-Thing (IoT) sensors. The development of streaming applications…
This paper presents a novel high speed clustering scheme for high dimensional data streams. Data stream clustering has gained importance in different applications, for example, in network monitoring, intrusion detection, and real-time…
TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of…
Data streams are a sequence of data flowing between source and destination processes. Streaming is widely used for signal, image and video processing for its efficiency in pipelining and effectiveness in reducing demand for memory. The goal…
Stream computing is the use of multiple autonomic and parallel modules together with integrative processors at a higher level of abstraction to embody "intelligent" processing. The biological basis of this computing is sketched and the…
Recent advancements in detector technology have significantly increased the size and complexity of experimental data, and high-performance computing (HPC) provides a path towards more efficient and timely data processing. However, movement…
Many well-known, real-world problems involve dynamic data which describe the relationship among the entities. Hypergraphs are powerful combinatorial structures that are frequently used to model such data. For many of today's data-centric…
We introduce DataCI, a comprehensive open-source platform designed specifically for data-centric AI in dynamic streaming data settings. DataCI provides 1) an infrastructure with rich APIs for seamless streaming dataset management,…
Efficient execution of deep learning workloads on dataflow architectures is crucial for overcoming memory bottlenecks and maximizing performance. While streaming intermediate results between computation kernels can significantly improve…