Related papers: Exploiting Categorical Structure Using Tree-Based …

Tree-Structured Modelling of Categorical Predictors in Regression

Generalized linear and additive models are very efficient regression tools but the selection of relevant terms becomes difficult if higher order interactions are needed. In contrast, tree-based methods also known as recursive partitioning…

Methodology · Statistics 2015-04-21 Gerhard Tutz , Moritz Berger

Ordinal Trees and Random Forests: Score-Free Recursive Partitioning and Improved Ensembles

Existing ordinal trees and random forests typically use scores that are assigned to the ordered categories, which implies that a higher scale level is used. Versions of ordinal trees are proposed that take the scale level seriously and…

Methodology · Statistics 2021-02-02 Gerhard Tutz

StructureBoost: Efficient Gradient Boosting for Structured Categorical Variables

Gradient boosting methods based on Structured Categorical Decision Trees (SCDT) have been demonstrated to outperform numerical and one-hot-encodings on problems where the categorical variable has a known underlying structure. However, the…

Machine Learning · Statistics 2020-07-10 Brian Lucena

Causal Structure Learning

Graphical models can represent a multivariate distribution in a convenient and accessible form as a graph. Causal models can be viewed as a special class of graphical models that not only represent the distribution of the observed system…

Methodology · Statistics 2017-06-29 Christina Heinze-Deml , Marloes H. Maathuis , Nicolai Meinshausen

Reinforced Decision Trees

In order to speed-up classification models when facing a large number of categories, one usual approach consists in organizing the categories in a particular structure, this structure being then used as a way to speed-up the prediction…

Machine Learning · Computer Science 2015-11-26 Aurélia Léon , Ludovic Denoyer

Improving the precision of classification trees

Besides serving as prediction models, classification trees are useful for finding important predictor variables and identifying interesting subgroups in the data. These functions can be compromised by weak split selection algorithms that…

Applications · Statistics 2010-11-03 Wei-Yin Loh

Classification Trees with Valid Inference via the Exponential Mechanism

Decision trees are widely used for non-linear modeling, as they capture interactions between predictors while producing inherently interpretable models. Despite their popularity, performing inference on the non-linear fit remains largely…

Methodology · Statistics 2026-04-14 Soham Bakshi , Snigdha Panigrahi

Dive into Decision Trees and Forests: A Theoretical Demonstration

Based on decision trees, many fields have arguably made tremendous progress in recent years. In simple words, decision trees use the strategy of "divide-and-conquer" to divide the complex problem on the dependency between input features and…

Machine Learning · Computer Science 2021-01-22 Jinxiong Zhang

Building Trees for Probabilistic Prediction via Scoring Rules

Decision trees built with data remain in widespread use for nonparametric prediction. Predicting probability distributions is preferred over point predictions when uncertainty plays a prominent role in analysis and decision-making. We study…

Methodology · Statistics 2024-06-21 Sara Shashaani , Ozge Surer , Matthew Plumlee , Seth Guikema

Structuring Causal Tree Models with Continuous Variables

This paper considers the problem of invoking auxiliary, unobservable variables to facilitate the structuring of causal tree models for a given set of continuous variables. Paralleling the treatment of bi-valued variables in [Pearl 1986], we…

Artificial Intelligence · Computer Science 2013-04-11 Lei Xu , Judea Pearl

Interval Temporal Logic Decision Tree Learning

Decision trees are simple, yet powerful, classification models used to classify categorical and numerical data, and, despite their simplicity, they are commonly used in operations research and management, as well as in knowledge mining.…

Logic in Computer Science · Computer Science 2020-03-13 Andrea Brunello , Guido Sciavicco , Ionel Eduard Stan

Tree-Structured Modelling of Varying Coefficients

The varying-coefficient model is a strong tool for the modelling of interactions in generalized regression. It is easy to apply if both the variables that are modified as well as the effect modifiers are known. However, in general one has a…

Methodology · Statistics 2017-05-25 Moritz Berger , Gerhard Tutz , Matthias Schmid

Leveraging Predictive Equivalence in Decision Trees

Decision trees are widely used for interpretable machine learning due to their clearly structured reasoning process. However, this structure belies a challenge we refer to as predictive equivalence: a given tree's decision boundary can be…

Machine Learning · Computer Science 2025-10-15 Hayden McTavish , Zachery Boner , Jon Donnelly , Margo Seltzer , Cynthia Rudin

Finding structure in data using multivariate tree boosting

Technology and collaboration enable dramatic increases in the size of psychological and psychiatric data collections, but finding structure in these large data sets with many collected variables is challenging. Decision tree ensembles like…

Machine Learning · Statistics 2017-02-15 Patrick J. Miller , Gitta H. Lubke , Daniel B. McArtor , C. S. Bergeman

A new class of generative classifiers based on staged tree models

Generative models for classification use the joint probability distribution of the class variable and the features to construct a decision rule. Among generative models, Bayesian networks and naive Bayes classifiers are the most commonly…

Artificial Intelligence · Computer Science 2022-08-05 Federico Carli , Manuele Leonelli , Gherardo Varando

Optimal Generalized Decision Trees via Integer Programming

Decision trees have been a very popular class of predictive models for decades due to their interpretability and good performance on categorical features. However, they are not always robust and tend to overfit the data. Additionally, if…

Machine Learning · Computer Science 2019-08-14 Oktay Gunluk , Jayant Kalagnanam , Minhan Li , Matt Menickelly , Katya Scheinberg

A Generalized Multinomial Distribution from Dependent Categorical Random Variables

Categorical random variables are a common staple in machine learning methods and other applications across disciplines. Many times, correlation within categorical predictors exists, and has been noted to have an effect on various algorithm…

Probability · Mathematics 2017-01-25 Rachel Traylor

Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation

We consider multi-class classification where the predictor has a hierarchical structure that allows for a very large number of labels both at train and test time. The predictive power of such models can heavily depend on the structure of…

Machine Learning · Statistics 2017-03-06 Yacine Jernite , Anna Choromanska , David Sontag

Large Scale Prediction with Decision Trees

This paper shows that decision trees constructed with Classification and Regression Trees (CART) and C4.5 methodology are consistent for regression and classification tasks, even when the number of predictor variables grows…

Machine Learning · Statistics 2023-11-15 Jason M. Klusowski , Peter M. Tian

Partition Tree: Conditional Density Estimation over General Outcome Spaces

We propose Partition Tree, a novel tree-based framework for conditional density estimation over general outcome spaces that supports both continuous and categorical variables within a unified formulation. Our approach models conditional…

Machine Learning · Computer Science 2026-05-13 Felipe Angelim , Alessandro Leite