English
Related papers

Related papers: Constructing Decision Trees from Data Streams

200 papers

Various modifications of decision trees have been extensively used during the past years due to their high efficiency and interpretability. Tree node splitting based on relevant feature selection is a key step of decision tree learning, at…

Machine Learning · Computer Science 2017-09-05 Dmitry Ignatov , Andrey Ignatov

We establish nearly optimal upper and lower bounds for approximating decision tree splits in data streams. For regression with labels in the range $\{0,1,\ldots,M\}$, we give a one-pass algorithm using $\tilde{O}(M^2/\epsilon)$ space that…

Data Structures and Algorithms · Computer Science 2026-04-23 Hoang Ta , Hoa T. Vu

Decision trees are one of the most popular classifiers in the machine learning literature. While the most common decision tree learning algorithms treat data as a batch, numerous algorithms have been proposed to construct decision trees…

Machine Learning · Computer Science 2026-01-21 Nikolaj Tatti

Decision tree optimization is fundamental to interpretable machine learning. The most popular approach is to greedily search for the best feature at every decision point, which is fast but provably suboptimal. Recent approaches find the…

Machine Learning · Computer Science 2025-11-19 Varun Babbar , Hayden McTavish , Cynthia Rudin , Margo Seltzer

Decision tree classifiers are a widely used tool in data stream mining. The use of confidence intervals to estimate the gain associated with each split leads to very effective methods, like the popular Hoeffding tree algorithm. From a…

Machine Learning · Statistics 2016-04-13 Rocco De Rosa

In this article we propose an optimal method referred to as SPlit for splitting a dataset into training and testing sets. SPlit is based on the method of Support Points (SP), which was initially developed for finding the optimal…

Machine Learning · Statistics 2021-05-10 V. Roshan Joseph , Akhil Vakayil

Decision trees and randomized forests are widely used in computer vision and machine learning. Standard algorithms for decision tree induction optimize the split functions one node at a time according to some splitting criteria. This greedy…

Machine Learning · Computer Science 2015-11-13 Mohammad Norouzi , Maxwell D. Collins , Matthew Johnson , David J. Fleet , Pushmeet Kohli

Computing an optimal classification tree that provably maximizes training performance within a given size limit, is NP-hard, and in practice, most state-of-the-art methods do not scale beyond computing optimal trees of depth three.…

Machine Learning · Computer Science 2025-01-15 Catalin E. Brita , Jacobus G. M. van der Linden , Emir Demirović

When artificial neural networks have demonstrated exceptional practical success in a variety of domains, investigations into their theoretical characteristics, such as their approximation power, statistical properties, and generalization…

Machine Learning · Statistics 2023-10-06 Shijin Gong , Xinyu Zhang

We present an efficient distributed online learning scheme to classify data captured from distributed, heterogeneous, and dynamic data sources. Our scheme consists of multiple distributed local learners, that analyze different streams of…

Machine Learning · Computer Science 2013-08-27 Luca Canzian , Yu Zhang , Mihaela van der Schaar

We introduce a new computational model for data streams: asymptotically exact streaming algorithms. These algorithms have an approximation ratio that tends to one as the length of the stream goes to infinity while the memory used by the…

Data Structures and Algorithms · Computer Science 2014-08-11 Marc Heinrich , Alexander Munteanu , Christian Sohler

This work studies one of the parallel decision tree learning algorithms, pdsCART, designed for scalable and efficient data analysis. The method incorporates three core capabilities. First, it supports real-time learning from data streams,…

Artificial Intelligence · Computer Science 2025-05-20 Zeinab Shiralizadeh

Emerging applications of machine learning in numerous areas involve continuous gathering of and learning from streams of data. Real-time incorporation of streaming data into the learned models is essential for improved inference in these…

Machine Learning · Computer Science 2020-12-01 Matthew Nokleby , Haroon Raja , Waheed U. Bajwa

Data-driven algorithm design is a paradigm that uses statistical and machine learning techniques to select from a class of algorithms for a computational problem an algorithm that has the best expected performance with respect to some…

Machine Learning · Computer Science 2024-06-05 Hongyu Cheng , Sammy Khalife , Barbara Fiedorowicz , Amitabh Basu

Decision trees and their ensembles are popular in machine learning as easy-to-understand models. Several techniques have been proposed in the literature for learning tree-based classifiers, with different techniques working well for data…

Machine Learning · Computer Science 2025-05-20 Maria-Florina Balcan , Dravyansh Sharma

In this paper we present a new algorithm for learning oblique decision trees. Most of the current decision tree algorithms rely on impurity measures to assess the goodness of hyperplanes at each node while learning a decision tree in a…

Machine Learning · Computer Science 2012-10-16 Naresh Manwani , P. S. Sastry

In this paper, we design the first streaming algorithms for the problem of multitasking scheduling on parallel machines with shared processing. In one pass, our streaming approximation schemes can provide an approximate value of the optimal…

Data Structures and Algorithms · Computer Science 2022-04-06 Bin Fu , Yumei Huo , Hairong Zhao

Data is often generated in streams, with new observations arriving over time. A key challenge for learning models from data streams is capturing relevant information while keeping computational costs manageable. We explore intelligent data…

Machine Learning · Computer Science 2025-12-23 Benedetta Lavinia Mussati , Freddie Bickford Smith , Tom Rainforth , Stephen Roberts

We present a novel approach for the problem of frequency estimation in data streams that is based on optimization and machine learning. Contrary to state-of-the-art streaming frequency estimation algorithms, which heavily rely on random…

Data Structures and Algorithms · Computer Science 2022-07-19 Dimitris Bertsimas , Vassilis Digalakis

We study the problem of partitioning integer sequences in the one-pass data streaming model. Given is an input stream of integers $X \in \{0, 1, \dots, m \}^n$ of length $n$ with maximum element $m$, and a parameter $p$. The goal is to…

Data Structures and Algorithms · Computer Science 2014-07-08 Christian Konrad , László Kozma
‹ Prev 1 2 3 10 Next ›