Related papers: Estimating decision tree learnability with polylog…

Active Learning for Decision Trees with Provable Guarantees

This paper advances the theoretical understanding of active learning label complexity for decision trees as binary classifiers. We make two main contributions. First, we provide the first analysis of the disagreement coefficient for…

Machine Learning · Computer Science 2026-02-20 Arshia Soltani Moakhar , Tanapoom Laoaron , Faraz Ghahremani , Kiarash Banihashem , MohammadTaghi Hajiaghayi

Top-down induction of decision trees: rigorous guarantees and inherent limitations

Consider the following heuristic for building a decision tree for a function $f : \{0,1\}^n \to \{\pm 1\}$. Place the most influential variable $x_i$ of $f$ at the root, and recurse on the subfunctions $f_{x_i=0}$ and $f_{x_i=1}$ on the…

Data Structures and Algorithms · Computer Science 2019-11-19 Guy Blanc , Jane Lange , Li-Yang Tan

Provable guarantees for decision tree induction: the agnostic setting

We give strengthened provable guarantees on the performance of widely employed and empirically successful {\sl top-down decision tree learning heuristics}. While prior works have focused on the realizable setting, we consider the more…

Data Structures and Algorithms · Computer Science 2020-06-02 Guy Blanc , Jane Lange , Li-Yang Tan

Tree Learning: Optimal Algorithms and Sample Complexity

We study the problem of learning a hierarchical tree representation of data from labeled samples, taken from an arbitrary (and possibly adversarial) distribution. Consider a collection of data tuples labeled according to their hierarchical…

Machine Learning · Computer Science 2023-02-10 Dmitrii Avdiukhin , Grigory Yaroslavtsev , Danny Vainstein , Orr Fischer , Sauman Das , Faraz Mirza

Universal guarantees for decision tree induction via a higher-order splitting criterion

We propose a simple extension of top-down decision tree learning heuristics such as ID3, C4.5, and CART. Our algorithm achieves provable guarantees for all target functions $f: \{-1,1\}^n \to \{-1,1\}$ with respect to the uniform…

Machine Learning · Computer Science 2020-10-20 Guy Blanc , Neha Gupta , Jane Lange , Li-Yang Tan

Logarithmic Time Online Multiclass prediction

We study the problem of multiclass classification with an extremely large number of classes (k), with the goal of obtaining train and test time complexity logarithmic in the number of classes. We develop top-down tree construction…

Machine Learning · Computer Science 2015-11-17 Anna Choromanska , John Langford

On the computational complexity of the probabilistic label tree algorithms

Label tree-based algorithms are widely used to tackle multi-class and multi-label problems with a large number of labels. We focus on a particular subclass of these algorithms that use probabilistic classifiers in the tree nodes. Examples…

Machine Learning · Computer Science 2019-06-04 Robert Busa-Fekete , Krzysztof Dembczynski , Alexander Golovnev , Kalina Jasinska , Mikhail Kuznetsov , Maxim Sviridenko , Chao Xu

Provably Robust Boosted Decision Stumps and Trees against Adversarial Attacks

The problem of adversarial robustness has been studied extensively for neural networks. However, for boosted decision trees and decision stumps there are almost no results, even though they are widely used in practice (e.g. XGBoost) due to…

Machine Learning · Computer Science 2019-11-01 Maksym Andriushchenko , Matthias Hein

An Analysis of Reduced Error Pruning

Top-down induction of decision trees has been observed to suffer from the inadequate functioning of the pruning phase. In particular, it is known that the size of the resulting tree grows linearly with the sample size, even though the…

Artificial Intelligence · Computer Science 2011-06-06 T. Elomaa , M. Kaariainen

How hard is learning to cut? Trade-offs and sample complexity

In the recent years, branch-and-cut algorithms have been the target of data-driven approaches designed to enhance the decision making in different phases of the algorithm such as branching, or the choice of cutting planes (cuts). In…

Optimization and Control · Mathematics 2025-06-03 Sammy Khalife , Andrea Lodi

MurTree: Optimal Classification Trees via Dynamic Programming and Search

Decision tree learning is a widely used approach in machine learning, favoured in applications that require concise and interpretable models. Heuristic methods are traditionally used to quickly produce models with reasonably high accuracy.…

Machine Learning · Computer Science 2022-06-30 Emir Demirović , Anna Lukina , Emmanuel Hebrard , Jeffrey Chan , James Bailey , Christopher Leckie , Kotagiri Ramamohanarao , Peter J. Stuckey

Estimating Learnability in the Sublinear Data Regime

We consider the problem of estimating how well a model class is capable of fitting a distribution of labeled data. We show that it is often possible to accurately estimate this "learnability" even when given an amount of data that is too…

Machine Learning · Computer Science 2019-03-26 Weihao Kong , Gregory Valiant

Decision Tree Learning on Product Spaces

Decision tree learning has long been a central topic in theoretical computer science, driven by its practical importance. A fundamental and widely used method for decision tree construction is the top-down greedy heuristic, which…

Machine Learning · Computer Science 2026-05-14 Arshia Soltani Moakahr , Faraz Ghahremani , Kiarash Banihashem , MohammadTaghi Hajiaghayi

Multiple-Goal Heuristic Search

This paper presents a new framework for anytime heuristic search where the task is to achieve as many goals as possible within the allocated resources. We show the inadequacy of traditional distance-estimation heuristics for tasks of this…

Artificial Intelligence · Computer Science 2015-03-19 D. Davidov , S. Markovitch

Learning accurate and interpretable tree-based models

Decision trees and their ensembles are popular in machine learning as easy-to-understand models. Several techniques have been proposed in the literature for learning tree-based classifiers, with different techniques working well for data…

Machine Learning · Computer Science 2025-05-20 Maria-Florina Balcan , Dravyansh Sharma

Low-Cost Labels, Reliable Choices: Rollout-Calibrated Hyper-Heuristics for Job Shop Scheduling

Learning-assisted hyper-heuristics can select among dispatching rules while preserving the feasibility and interpretability of constructive Job Shop Scheduling Problem (JSSP) heuristics. Their main computational cost lies in label…

Artificial Intelligence · Computer Science 2026-05-26 Junhao Wei , Yanxiao Li , Yifu Zhao , Zhenhong Peng , Baili Lu , Dexing Yao , Haochen Li , Qinbin He , Sio-Kei Im , Yapeng Wang , Xu Yang

Multidimensional Belief Quantification for Label-Efficient Meta-Learning

Optimization-based meta-learning offers a promising direction for few-shot learning that is essential for many real-world computer vision applications. However, learning from few samples introduces uncertainty, and quantifying model…

Computer Vision and Pattern Recognition · Computer Science 2022-03-25 Deep Pandey , Qi Yu

On the Robustness of Decision Tree Learning under Label Noise

In most practical problems of classifier learning, the training data suffers from the label noise. Hence, it is important to understand how robust is a learning algorithm to such label noise. This paper presents some theoretical analysis to…

Machine Learning · Computer Science 2016-08-29 Aritra Ghosh , Naresh Manwani , P. S. Sastry

Multi-label Classification under Uncertainty: A Tree-based Conformal Prediction Approach

Multi-label classification is a common challenge in various machine learning applications, where a single data instance can be associated with multiple classes simultaneously. The current paper proposes a novel tree-based method for…

Methodology · Statistics 2024-05-01 Chhavi Tyagi , Wenge Guo

Probabilistic Label Trees for Extreme Multi-label Classification

Extreme multi-label classification (XMLC) is a learning task of tagging instances with a small subset of relevant labels chosen from an extremely large pool of possible labels. Problems of this scale can be efficiently handled by organizing…

Machine Learning · Computer Science 2020-09-24 Kalina Jasinska-Kobus , Marek Wydmuch , Krzysztof Dembczynski , Mikhail Kuznetsov , Robert Busa-Fekete