English

Large-Scale Learning from Data Streams with Apache SAMOA

Distributed, Parallel, and Cluster Computing 2018-05-30 v1

Abstract

Apache SAMOA (Scalable Advanced Massive Online Analysis) is an open-source platform for mining big data streams. Big data is defined as datasets whose size is beyond the ability of typical software tools to capture, store, manage, and analyze, due to the time and memory complexity. Apache SAMOA provides a collection of distributed streaming algorithms for the most common data mining and machine learning tasks such as classification, clustering, and regression, as well as programming abstractions to develop new algorithms. It features a pluggable architecture that allows it to run on several distributed stream processing engines such as Apache Flink, Apache Storm, and Apache Samza. Apache SAMOA is written in Java and is available at https://samoa.incubator.apache.org under the Apache Software License version 2.0.

Keywords

Cite

@article{arxiv.1805.11477,
  title  = {Large-Scale Learning from Data Streams with Apache SAMOA},
  author = {Nicolas Kourtellis and Gianmarco De Francisci Morales and Albert Bifet},
  journal= {arXiv preprint arXiv:1805.11477},
  year   = {2018}
}

Comments

31 pages, 7 Tables, 16 Figures, 26 References. arXiv admin note: substantial text overlap with arXiv:1607.08325

R2 v1 2026-06-23T02:12:01.092Z