English

Constructing Decision Trees from Data Streams

Data Structures and Algorithms 2025-04-18 v4 Artificial Intelligence Machine Learning

Abstract

In this work, we present data stream algorithms to compute optimal splits for decision tree learning. In particular, given a data stream of observations xix_i and their corresponding labels yiy_i, without the i.i.d. assumption, the objective is to identify the optimal split jj that partitions the data into two sets, minimizing the mean squared error (for regression) or the misclassification rate and Gini impurity (for classification). We propose several efficient streaming algorithms that require sublinear space and use a small number of passes to solve these problems. These algorithms can also be extended to the MapReduce model. Our results, while not directly comparable, complements the seminal work of Domingos-Hulten (KDD 2000) and Hulten-Spencer-Domingos (KDD 2001).

Keywords

Cite

@article{arxiv.2403.19867,
  title  = {Constructing Decision Trees from Data Streams},
  author = {Huy Pham and Hoang Ta and Hoa T. Vu},
  journal= {arXiv preprint arXiv:2403.19867},
  year   = {2025}
}

Comments

To appear at ISIT 2025