Related papers: Generative Forests
Despite the rise to dominance of deep learning in unstructured data domains, tree-based methods such as Random Forests (RF) and Gradient Boosted Decision Trees (GBDT) are still the workhorses for handling discriminative tasks on tabular…
While Generative Adversarial Networks (GANs) achieve spectacular results on unstructured data like images, there is still a gap on tabular data, data for which state of the art supervised learning still favours to a large extent decision…
A common objective in the analysis of tabular data is estimating the conditional distribution (in contrast to only producing predictions) of a set of "outcome" variables given a set of "covariates", which is sometimes referred to as the…
Tabular data stands out as one of the most frequently encountered types in high energy physics. Unlike commonly homogeneous data such as pixelated images, simulating high-dimensional tabular data and accurately capturing their correlations…
Decision Trees and Random Forests are among the most widely used machine learning models, and often achieve state-of-the-art performance in tabular, domain-agnostic datasets. Nonetheless, being primarily discriminative models they lack…
We propose methods for density estimation and data synthesis using a novel form of unsupervised random forests. Inspired by generative adversarial networks, we implement a recursive procedure in which trees gradually learn structural…
Tabular data is one of the most widely used data formats across various domains such as bioinformatics, healthcare, and marketing. As artificial intelligence moves towards a data-centric perspective, improving data quality is essential for…
Within machine learning, the supervised learning field aims at modeling the input-output relationship of a system, from past observations of its behavior. Decision trees characterize the input-output relationship through a series of nested…
Generative AI has achieved remarkable empirical success, but from the perspective of statistics it often remains opaque: its predictions may be accurate, yet the underlying mechanism is difficult to interpret, analyze, and trust. This book…
In order to speed-up classification models when facing a large number of categories, one usual approach consists in organizing the categories in a particular structure, this structure being then used as a way to speed-up the prediction…
Building predictive models for tabular data presents fundamental challenges, notably in scaling consistently, i.e., more resources translating to better performance, and generalizing systematically beyond the training data distribution.…
We present a general architecture of deep differentiable forest and its sparse attention mechanism. The differentiable forest has the advantages of both trees and neural networks. Its structure is a simple binary tree, easy to use and…
We present Neural Random Forest Imitation - a novel approach for transforming random forests into neural networks. Existing methods propose a direct mapping and produce very inefficient architectures. In this work, we introduce an imitation…
More often than not in benchmark supervised ML, tabular data is flat, i.e. consists of a single $m \times d$ (rows, columns) file, but cases abound in the real world where observations are described by a set of tables with structural…
In recent times, many of the breakthroughs in various vision-related tasks have revolved around improving learning of deep models; these methods have ranged from network architectural improvements such as Residual Networks, to various forms…
Tabular data is prevalent across diverse domains in machine learning. With the rapid progress of deep tabular prediction methods, especially pretrained (foundation) models, there is a growing need to evaluate these methods systematically…
Tabular data is hard to acquire and is subject to missing values. This paper introduces a novel approach for generating and imputing mixed-type (continuous and categorical) tabular data utilizing score-based diffusion and conditional flow…
Tabular data is one of the most widely used formats across industries, driving critical applications in areas such as finance, healthcare, and marketing. In the era of data-centric AI, improving data quality and representation has become…
Dynamic regression trees are an attractive option for automatic regression and classification with complicated response surfaces in on-line application settings. We create a sequential tree model whose state changes in time with the…
Deep Generative AI has been a long-standing essential topic in the machine learning community, which can impact a number of application areas like text generation and computer vision. The major paradigm to train a generative model is…