Related papers: JoinBoost: Grow Trees Over Normalized Data Using O…

XGenBoost: Synthesizing Small and Large Tabular Datasets with XGBoost

Tree ensembles such as XGBoost are often preferred for discriminative tasks in mixed-type tabular data, due to their inductive biases, minimal hyperparameter tuning, and training efficiency. We argue that these qualities, when leveraged…

Machine Learning · Computer Science 2026-03-10 Jim Achterberg , Marcel Haas , Bram van Dijk , Marco Spruit

Tree Boosting Methods for Balanced andImbalanced Classification and their Robustness Over Time in Risk Assessment

Most real-world classification problems deal with imbalanced datasets, posing a challenge for Artificial Intelligence (AI), i.e., machine learning algorithms, because the minority class, which is of extreme interest, often proves difficult…

Machine Learning · Computer Science 2025-04-28 Gissel Velarde , Michael Weichert , Anuj Deshmunkh , Sanjay Deshmane , Anindya Sudhir , Khushboo Sharma , Vaibhav Joshi

BoostLLM: Boosting-inspired LLM Fine-tuning for Few-shot Tabular Classification

Large language models (LLMs) have recently been adapted to tabular prediction by serializing structured features into natural language, but their performance in low-data regimes remains limited compared to gradient-boosted decision trees…

Machine Learning · Computer Science 2026-05-12 Yi-Siang Wang , Kuan-Yu Chen , Yu-Chen Den , Darby Tien-Hao Chang

A Simple and Fast Baseline for Tuning Large XGBoost Models

XGBoost, a scalable tree boosting algorithm, has proven effective for many prediction tasks of practical interest, especially using tabular datasets. Hyperparameter tuning can further improve the predictive performance, but unlike neural…

Machine Learning · Computer Science 2021-11-16 Sanyam Kapoor , Valerio Perrone

NRGBoost: Energy-Based Generative Boosted Trees

Despite the rise to dominance of deep learning in unstructured data domains, tree-based methods such as Random Forests (RF) and Gradient Boosted Decision Trees (GBDT) are still the workhorses for handling discriminative tasks on tabular…

Machine Learning · Computer Science 2025-04-21 João Bravo

XGBoost: A Scalable Tree Boosting System

Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results…

Machine Learning · Computer Science 2016-06-14 Tianqi Chen , Carlos Guestrin

Faster Boosting with Smaller Memory

State-of-the-art implementations of boosting, such as XGBoost and LightGBM, can process large training sets extremely fast. However, this performance requires that the memory size is sufficient to hold a 2-3 multiple of the training set…

Machine Learning · Computer Science 2019-10-29 Julaiti Alafate , Yoav Freund

Comparative Analysis of Modern Machine Learning Models for Retail Sales Forecasting

Accurate demand forecasting is critical for brick-and-mortar retailers to optimize inventory management and minimize costs. This study evaluates statistical baselines, tree-based ensembles (XGBoost and LightGBM), and deep learning…

Machine Learning · Computer Science 2026-03-12 Luka Hobor , Mario Brcic , Lidija Polutnik , Ante Kapetanovic

StructBoost: Boosting Methods for Predicting Structured Output Variables

Boosting is a method for learning a single accurate predictor by linearly combining a set of less accurate weak learners. Recently, structured learning has found many applications in computer vision. Inspired by structured support vector…

Machine Learning · Computer Science 2020-03-10 Chunhua Shen , Guosheng Lin , Anton van den Hengel

Tabular Data: Deep Learning is Not All You Need

A key element in solving real-life data science problems is selecting the types of models to use. Tree ensemble models (such as XGBoost) are usually recommended for classification and regression problems with tabular data. However, several…

Machine Learning · Computer Science 2021-11-24 Ravid Shwartz-Ziv , Amitai Armon

Enhanced version of AdaBoostM1 with J48 Tree learning method

Machine Learning focuses on the construction and study of systems that can learn from data. This is connected with the classification problem, which usually is what Machine Learning algorithms are designed to solve. When a machine learning…

Machine Learning · Statistics 2018-02-13 Kyongche Kang , Jack Michalak

PaloBoost: An Overfitting-robust TreeBoost with Out-of-Bag Sample Regularization Techniques

Stochastic Gradient TreeBoost is often found in many winning solutions in public data science challenges. Unfortunately, the best performance requires extensive parameter tuning and can be prone to overfitting. We propose PaloBoost, a…

Machine Learning · Statistics 2018-07-24 Yubin Park , Joyce C. Ho

LLMBoost: Make Large Language Models Stronger with Boosting

Ensemble learning of LLMs has emerged as a promising alternative to enhance performance, but existing approaches typically treat models as black boxes, combining the inputs or final outputs while overlooking the rich internal…

Machine Learning · Computer Science 2025-12-30 Zehao Chen , Tianxiang Ai , Yifei Li , Gongxun Li , Yuyang Wei , Wang Zhou , Guanghui Li , Bin Yu , Zhijun Chen , Hailong Sun , Fuzhen Zhuang , Jianxin Li , Deqing Wang , Yikun Ban

Unified Robust Boosting

Boosting is a popular algorithm in supervised machine learning with wide applications in regression and classification problems. It combines weak learners, such as regression trees, to obtain accurate predictions. However, in the presence…

Computation · Statistics 2025-02-06 Zhu Wang

Boosting gets full Attention for Relational Learning

More often than not in benchmark supervised ML, tabular data is flat, i.e. consists of a single $m \times d$ (rows, columns) file, but cases abound in the real world where observations are described by a set of tables with structural…

Machine Learning · Computer Science 2024-02-26 Mathieu Guillame-Bert , Richard Nock

A Federated Learning Benchmark on Tabular Data: Comparing Tree-Based Models and Neural Networks

Federated Learning (FL) has lately gained traction as it addresses how machine learning models train on distributed datasets. FL was designed for parametric models, namely Deep Neural Networks (DNNs).Thus, it has shown promise on image and…

Machine Learning · Computer Science 2024-05-06 William Lindskog , Christian Prehofer

MorphBoost: Self-Organizing Universal Gradient Boosting with Adaptive Tree Morphing

Traditional gradient boosting algorithms employ static tree structures with fixed splitting criteria that remain unchanged throughout training, limiting their ability to adapt to evolving gradient distributions and problem-specific…

Machine Learning · Computer Science 2025-11-18 Boris Kriuk

Binary Classification: Is Boosting stronger than Bagging?

Random Forests have been one of the most popular bagging methods in the past few decades, especially due to their success at handling tabular datasets. They have been extensively studied and compared to boosting models, like XGBoost, which…

Machine Learning · Computer Science 2024-10-28 Dimitris Bertsimas , Vasiliki Stoumpou

Transformers Boost the Performance of Decision Trees on Tabular Data across Sample Sizes

Large language models (LLMs) perform remarkably well on tabular datasets in zero- and few-shot settings, since they can extract meaning from natural language column headers that describe features and labels. Similarly, TabPFN, a recent…

Computation and Language · Computer Science 2025-02-07 Mayuka Jayawardhana , Renbo , Samuel Dooley , Valeriia Cherepanova , Andrew Gordon Wilson , Frank Hutter , Colin White , Tom Goldstein , Micah Goldblum

A Comparative Analysis of XGBoost

XGBoost is a scalable ensemble technique based on gradient boosting that has demonstrated to be a reliable and efficient machine learning challenge solver. This work proposes a practical analysis of how this novel technique works in terms…

Machine Learning · Computer Science 2023-05-05 Candice Bentéjac , Anna Csörgő , Gonzalo Martínez-Muñoz