Related papers: Binary Classification: Is Boosting stronger than B…

Tree Boosting Methods for Balanced andImbalanced Classification and their Robustness Over Time in Risk Assessment

Most real-world classification problems deal with imbalanced datasets, posing a challenge for Artificial Intelligence (AI), i.e., machine learning algorithms, because the minority class, which is of extreme interest, often proves difficult…

Machine Learning · Computer Science 2025-04-28 Gissel Velarde , Michael Weichert , Anuj Deshmunkh , Sanjay Deshmane , Anindya Sudhir , Khushboo Sharma , Vaibhav Joshi

Improved Weighted Random Forest for Classification Problems

Several studies have shown that combining machine learning models in an appropriate way will introduce improvements in the individual predictions made by the base models. The key to make well-performing ensemble model is in the diversity of…

Machine Learning · Computer Science 2021-03-01 Mohsen Shahhosseini , Guiping Hu

Adaptive Forests For Classification

Random Forests (RF) and Extreme Gradient Boosting (XGBoost) are two of the most widely used and highly performing classification and regression models. They aggregate equally weighted CART trees, generated randomly in RF or sequentially in…

Machine Learning · Computer Science 2025-10-28 Dimitris Bertsimas , Yubing Cui

Lassoed Forests: Random Forests with Adaptive Lasso Post-selection

Random forests are a statistical learning technique that use bootstrap aggregation to average high-variance and low-bias trees. Improvements to random forests, such as applying Lasso regression to the tree predictions, have been proposed in…

Machine Learning · Statistics 2025-11-13 Jing Shang , James Bannon , Benjamin Haibe-Kains , Robert Tibshirani

XGBoost: A Scalable Tree Boosting System

Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results…

Machine Learning · Computer Science 2016-06-14 Tianqi Chen , Carlos Guestrin

NRGBoost: Energy-Based Generative Boosted Trees

Despite the rise to dominance of deep learning in unstructured data domains, tree-based methods such as Random Forests (RF) and Gradient Boosted Decision Trees (GBDT) are still the workhorses for handling discriminative tasks on tabular…

Machine Learning · Computer Science 2025-04-21 João Bravo

Getting Better from Worse: Augmented Bagging and a Cautionary Tale of Variable Importance

As the size, complexity, and availability of data continues to grow, scientists are increasingly relying upon black-box learning algorithms that can often provide accurate predictions with minimal a priori model specifications. Tools like…

Machine Learning · Statistics 2020-11-10 Lucas Mentch , Siyu Zhou

BoostTree and BoostForest for Ensemble Learning

Bootstrap aggregating (Bagging) and boosting are two popular ensemble learning approaches, which combine multiple base learners to generate a composite model for more accurate and more reliable performance. They have been widely used in…

Machine Learning · Computer Science 2022-12-07 Changming Zhao , Dongrui Wu , Jian Huang , Ye Yuan , Hai-Tao Zhang , Ruimin Peng , Zhenhua Shi

Best-scored Random Forest Classification

We propose an algorithm named best-scored random forest for binary classification problems. The terminology "best-scored" means to select the one with the best empirical performance out of a certain number of purely random tree candidates…

Machine Learning · Statistics 2019-05-28 Hanyuan Hang , Xiaoyu Liu , Ingo Steinwart

A Comparison of Modeling Preprocessing Techniques

This paper compares the performance of various data processing methods in terms of predictive performance for structured data. This paper also seeks to identify and recommend preprocessing methodologies for tree-based binary classification…

Methodology · Statistics 2023-02-27 Tosan Johnson , Alice J. Liu , Syed Raza , Aaron McGuire

FastForest: Increasing Random Forest Processing Speed While Maintaining Accuracy

Random Forest remains one of Data Mining's most enduring ensemble algorithms, achieving well-documented levels of accuracy and processing speed, as well as regularly appearing in new research. However, with data mining now reaching the…

Machine Learning · Computer Science 2020-04-07 Darren Yates , Md Zahidul Islam

Robust Classification of High Dimension Low Sample Size Data

The robustification of pattern recognition techniques has been the subject of intense research in recent years. Despite the multiplicity of papers on the subject, very few articles have deeply explored the topic of robust classification in…

Applications · Statistics 2015-01-06 Necla Gunduz , Ernest Fokoue

hi-RF: Incremental Learning Random Forest for large-scale multi-class Data Classification

In recent years, dynamically growing data and incrementally growing number of classes pose new challenges to large-scale data classification research. Most traditional methods struggle to balance the precision and computational burden when…

Machine Learning · Computer Science 2016-11-01 Tingting Xie , Yuxing Peng , Changjian Wang

Which Imputation Fits Which Feature Selection Method? A Survey-Based Simulation Study

Tree-based learning methods such as Random Forest and XGBoost are still the gold-standard prediction methods for tabular data. Feature importance measures are usually considered for feature selection as well as to assess the effect of…

Applications · Statistics 2024-12-19 Jakob Schwerter , Andrés Romero , Florian Dumpert , Markus Pauly

A Powerful Random Forest Featuring Linear Extensions (RaFFLE)

Random forests are widely used in regression. However, the decision trees used as base learners are poor approximators of linear relationships. To address this limitation we propose RaFFLE (Random Forest Featuring Linear Extensions), a…

Machine Learning · Computer Science 2025-02-17 Jakob Raymaekers , Peter J. Rousseeuw , Thomas Servotte , Tim Verdonck , Ruicong Yao

A Numerical Transform of Random Forest Regressors corrects Systematically-Biased Predictions

Over the past decade, random forest models have become widely used as a robust method for high-dimensional data regression tasks. In part, the popularity of these models arises from the fact that they require little hyperparameter tuning…

Machine Learning · Computer Science 2020-03-18 Shipra Malhotra , John Karanicolas

Diversity Conscious Refined Random Forest

Random Forest (RF) is a widely used ensemble learning technique known for its robust classification performance across diverse domains. However, it often relies on hundreds of trees and all input features, leading to high inference cost and…

Machine Learning · Computer Science 2025-07-08 Sijan Bhattarai , Saurav Bhandari , Girija Bhusal , Saroj Shakya , Tapendra Pandey

Calibrated Boosting-Forest

Excellent ranking power along with well calibrated probability estimates are needed in many classification tasks. In this paper, we introduce a technique, Calibrated Boosting-Forest that captures both. This novel technique is an ensemble of…

Machine Learning · Statistics 2017-11-15 Haozhen Wu

Evolutionary bagging for ensemble learning

Ensemble learning has gained success in machine learning with major advantages over other learning methods. Bagging is a prominent ensemble learning method that creates subgroups of data, known as bags, that are trained by individual…

Neural and Evolutionary Computing · Computer Science 2022-09-07 Giang Ngo , Rodney Beard , Rohitash Chandra

Random Forests for Big Data

Big Data is one of the major challenges of statistical science and has numerous consequences from algorithmic and theoretical viewpoints. Big Data always involve massive data but they also often include online data and data heterogeneity.…

Machine Learning · Statistics 2017-03-23 Robin Genuer , Jean-Michel Poggi , Christine Tuleau-Malot , Nathalie Villa-Vialaneix