Related papers: Large-Scale Learning from Data Streams with Apache…

Rebalancing Learning on Evolving Data Streams

Nowadays, every device connected to the Internet generates an ever-growing stream of data (formally, unbounded). Machine Learning on unbounded data streams is a grand challenge due to its resource constraints. In fact, standard machine…

Machine Learning · Computer Science 2019-11-19 Alessio Bernardo , Emanuele Della Valle , Albert Bifet

Online Machine Learning in Big Data Streams

The area of online machine learning in big data streams covers algorithms that are (1) distributed and (2) work from data streams with only a limited possibility to store past data. The first requirement mostly concerns software…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-19 András A. Benczúr , Levente Kocsis , Róbert Pálovics

Distributed Streaming Analytics on Large-scale Oceanographic Data using Apache Spark

Real-world data from diverse domains require real-time scalable analysis. Large-scale data processing frameworks or engines such as Hadoop fall short when results are needed on-the-fly. Apache Spark's streaming library is increasingly…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-08-02 Janak Dahal , Elias Ioup , Shaikh Arifuzzaman , Mahdi Abdelguerfi

A Framework of Sparse Online Learning and Its Applications

The amount of data in our society has been exploding in the era of big data today. In this paper, we address several open challenges of big data stream classification, including high volume, high velocity, high dimensionality, high…

Machine Learning · Computer Science 2015-07-28 Dayong Wang , Pengcheng Wu , Peilin Zhao , Steven C. H. Hoi

Distributed Real-Time Sentiment Analysis for Big Data Social Streams

Big data trend has enforced the data-centric systems to have continuous fast data streams. In recent years, real-time analytics on stream data has formed into a new research field, which aims to answer queries about what-is-happening-now…

Machine Learning · Statistics 2016-12-28 Amir Hossein Akhavan Rahnama

SAMOSA: Sharpness Aware Minimization for Open Set Active learning

Modern machine learning solutions require extensive data collection where labeling remains costly. To reduce this burden, open set active learning approaches aim to select informative samples from a large pool of unlabeled data that…

Machine Learning · Computer Science 2025-10-27 Young In Kim , Andrea Agiollo , Rajiv Khanna

Model-driven Scheduling for Distributed Stream Processing Systems

Distributed Stream Processing frameworks are being commonly used with the evolution of Internet of Things(IoT). These frameworks are designed to adapt to the dynamic input message rate by scaling in/out.Apache Storm, originally developed by…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-05-10 Anshu Shukla , Yogesh Simmhan

Big IoT and social networking data for smart cities: Algorithmic improvements on Big Data Analysis in the context of RADICAL city applications

In this paper we present a SOA (Service Oriented Architecture)-based platform, enabling the retrieval and analysis of big datasets stemming from social networking (SN) sites and Internet of Things (IoT) devices, collected by smart city…

Computers and Society · Computer Science 2016-07-05 Evangelos Psomakelis , Fotis Aisopos , Antonios Litke , Konstantinos Tserpes , Magdalini Kardara , Pablo Martínez Campo

Deep Learning with Apache SystemML

Enterprises operate large data lakes using Hadoop and Spark frameworks that (1) run a plethora of tools to automate powerful data preparation/transformation pipelines, (2) run on shared, large clusters to (3) perform many different…

Machine Learning · Computer Science 2018-02-14 Niketan Pansare , Michael Dusenberry , Nakul Jindal , Matthias Boehm , Berthold Reinwald , Prithviraj Sen

Adaptive Normalization in Streaming Data

In todays digital era, data are everywhere from Internet of Things to health care or financial applications. This leads to potentially unbounded ever-growing Big data streams and it needs to be utilized effectively. Data normalization is an…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-24 Vibhuti Gupta , Rattikorn Hewett

Fast Data Management with Distributed Streaming SQL

To stay competitive in today's data driven economy, enterprises large and small are turning to stream processing platforms to process high volume, high velocity, and diverse streams of data (fast data) as they arrive. Low-level programming…

Databases · Computer Science 2015-11-13 Milinda Pathirage , Beth Plale

Service Intelligence Oriented Distributed Data Stream Integration

Software as a service (SaaS) has recently enjoyed much attention as it makes the use of software more convenient and cost-effective. At the same time, the arising of users' expectation for high quality service such as real-time information…

Software Engineering · Computer Science 2016-04-13 Feng-Lin Li , Chi-Hung Chi , Yue Wang , Cong Liu

On the Semantic Overlap of Operators in Stream Processing Engines

Stream processing is extensively used in the IoT-to-Cloud spectrum to distill information from continuous streams of data. Streaming applications usually run in dedicated Stream Processing Engines (SPEs) that adopt the DataFlow model, which…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-03-03 Vincenzo Gulisano , Alessandro Margara , Marina Papatriantafilou

Apache Spark Streaming, Kafka and HarmonicIO: A Performance Benchmark and Architecture Comparison for Enterprise and Scientific Computing

This paper presents a benchmark of stream processing throughput comparing Apache Spark Streaming (under file-, TCP socket- and Kafka-based stream integration), with a prototype P2P stream processing framework, HarmonicIO. Maximum throughput…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-12-20 Ben Blamey , Andreas Hellander , Salman Toor

Benchmarking Distributed Stream Data Processing Systems

The need for scalable and efficient stream analysis has led to the development of many open-source streaming data processing systems (SDPSs) with highly diverging capabilities and performance characteristics. While first initiatives try to…

Databases · Computer Science 2019-06-27 Jeyhun Karimov , Tilmann Rabl , Asterios Katsifodimos , Roman Samarev , Henri Heiskanen , Volker Markl

BigDL: A Distributed Deep Learning Framework for Big Data

This paper presents BigDL (a distributed deep learning framework for Apache Spark), which has been used by a variety of users in the industry for building deep learning applications on production big data platforms. It allows deep learning…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-04-13 Jason Dai , Yiheng Wang , Xin Qiu , Ding Ding , Yao Zhang , Yanzhang Wang , Xianyan Jia , Cherry Zhang , Yan Wan , Zhichao Li , Jiao Wang , Shengsheng Huang , Zhongyuan Wu , Yang Wang , Yuhao Yang , Bowen She , Dongjie Shi , Qi Lu , Kai Huang , Guoqiong Song

Big Data-Driven Fraud Detection Using Machine Learning and Real-Time Stream Processing

In the age of digital finance, detecting fraudulent transactions and money laundering is critical for financial institutions. This paper presents a scalable and efficient solution using Big Data tools and machine learning models. We utilize…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-04 Chen Liu , Hengyu Tang , Zhixiao Yang , Ke Zhou , Sangwhan Cha

Overview of streaming-data algorithms

Due to recent advances in data collection techniques, massive amounts of data are being collected at an extremely fast pace. Also, these data are potentially unbounded. Boundless streams of data collected from sensors, equipments, and other…

Databases · Computer Science 2012-03-12 T Soni Madhulatha

DPASF: A Flink Library for Streaming Data preprocessing

Data preprocessing techniques are devoted to correct or alleviate errors in data. Discretization and feature selection are two of the most extended data preprocessing techniques. Although we can find many proposals for static Big Data…

Databases · Computer Science 2018-10-16 Alejandro Alcalde-Barros , Diego García-Gil , Salvador García , Francisco Herrera

Benchmarking scalability of stream processing frameworks deployed as microservices in the cloud

Context: The combination of distributed stream processing with microservice architectures is an emerging pattern for building data-intensive software systems. In such systems, stream processing frameworks such as Apache Flink, Apache Kafka…

Software Engineering · Computer Science 2023-11-02 Sören Henning , Wilhelm Hasselbring