English

A Clustering-based Framework for Classifying Data Streams

Machine Learning 2021-06-23 v1 Artificial Intelligence

Abstract

The non-stationary nature of data streams strongly challenges traditional machine learning techniques. Although some solutions have been proposed to extend traditional machine learning techniques for handling data streams, these approaches either require an initial label set or rely on specialized design parameters. The overlap among classes and the labeling of data streams constitute other major challenges for classifying data streams. In this paper, we proposed a clustering-based data stream classification framework to handle non-stationary data streams without utilizing an initial label set. A density-based stream clustering procedure is used to capture novel concepts with a dynamic threshold and an effective active label querying strategy is introduced to continuously learn the new concepts from the data streams. The sub-cluster structure of each cluster is explored to handle the overlap among classes. Experimental results and quantitative comparison studies reveal that the proposed method provides statistically better or comparable performance than the existing methods.

Keywords

Cite

@article{arxiv.2106.11823,
  title  = {A Clustering-based Framework for Classifying Data Streams},
  author = {Xuyang Yan and Abdollah Homaifar and Mrinmoy Sarkar and Abenezer Girma and Edward Tunstel},
  journal= {arXiv preprint arXiv:2106.11823},
  year   = {2021}
}

Comments

This paper has been accepted by IJCAI 2021