English

Engineering Crowdsourced Stream Processing Systems

Databases 2014-08-05 v3 Artificial Intelligence Software Engineering

Abstract

A crowdsourced stream processing system (CSP) is a system that incorporates crowdsourced tasks in the processing of a data stream. This can be seen as enabling crowdsourcing work to be applied on a sample of large-scale data at high speed, or equivalently, enabling stream processing to employ human intelligence. It also leads to a substantial expansion of the capabilities of data processing systems. Engineering a CSP system requires the combination of human and machine computation elements. From a general systems theory perspective, this means taking into account inherited as well as emerging properties from both these elements. In this paper, we position CSP systems within a broader taxonomy, outline a series of design principles and evaluation metrics, present an extensible framework for their design, and describe several design patterns. We showcase the capabilities of CSP systems by performing a case study that applies our proposed framework to the design and analysis of a real system (AIDR) that classifies social media messages during time-critical crisis events. Results show that compared to a pure stream processing system, AIDR can achieve a higher data classification accuracy, while compared to a pure crowdsourcing solution, the system makes better use of human workers by requiring much less manual work effort.

Keywords

Cite

@article{arxiv.1310.5463,
  title  = {Engineering Crowdsourced Stream Processing Systems},
  author = {Muhammad Imran and Ioanna Lykourentzou and Yannick Naudet and Carlos Castillo},
  journal= {arXiv preprint arXiv:1310.5463},
  year   = {2014}
}
R2 v1 2026-06-22T01:50:43.447Z