English
Related papers

Related papers: Multiple Imputation Through XGBoost

200 papers

Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results…

Machine Learning · Computer Science 2016-06-14 Tianqi Chen , Carlos Guestrin

Statistical learning methods for automated variable selection, such as the Least Absolute Shrinkage and Selection Operator (LASSO), elastic nets, and gradient boosting, have become increasingly popular tools for building powerful prediction…

Machine Learning · Statistics 2026-04-13 Robert Kuchen

When data have a hierarchical structure, such as students nested within classrooms, ignoring dependencies between observations can compromise the validity of imputation procedures. Standard tree-based imputation methods implicitly assume…

Applications · Statistics 2025-03-21 Nico Föge , Jakob Schwerter , Ketevan Gurtskaia , Markus Pauly , Philipp Doebler

XGBoost is a scalable ensemble technique based on gradient boosting that has demonstrated to be a reliable and efficient machine learning challenge solver. This work proposes a practical analysis of how this novel technique works in terms…

Machine Learning · Computer Science 2023-05-05 Candice Bentéjac , Anna Csörgő , Gonzalo Martínez-Muñoz

Most real-world classification problems deal with imbalanced datasets, posing a challenge for Artificial Intelligence (AI), i.e., machine learning algorithms, because the minority class, which is of extreme interest, often proves difficult…

Machine Learning · Computer Science 2025-04-28 Gissel Velarde , Michael Weichert , Anuj Deshmunkh , Sanjay Deshmane , Anindya Sudhir , Khushboo Sharma , Vaibhav Joshi

XGBoost, a scalable tree boosting algorithm, has proven effective for many prediction tasks of practical interest, especially using tabular datasets. Hyperparameter tuning can further improve the predictive performance, but unlike neural…

Machine Learning · Computer Science 2021-11-16 Sanyam Kapoor , Valerio Perrone

Tree ensembles such as XGBoost are often preferred for discriminative tasks in mixed-type tabular data, due to their inductive biases, minimal hyperparameter tuning, and training efficiency. We argue that these qualities, when leveraged…

Machine Learning · Computer Science 2026-03-10 Jim Achterberg , Marcel Haas , Bram van Dijk , Marco Spruit

We describe the multi-GPU gradient boosting algorithm implemented in the XGBoost library (https://github.com/dmlc/xgboost). Our algorithm allows fast, scalable training on multi-GPU systems with all of the features of the XGBoost library.…

Machine Learning · Computer Science 2018-07-02 Rory Mitchell , Andrey Adinets , Thejaswi Rao , Eibe Frank

The XGBoost method has many advantages and is especially suitable for statistical analysis of big data, but its loss function is limited to convex functions. In many specific applications, a nonconvex loss function would be preferable. In…

Machine Learning · Computer Science 2022-01-20 Yang Guang

Missing data are present in most real world problems and need careful handling to preserve the prediction accuracy and statistical consistency in the downstream analysis. As the gold standard of handling missing data, multiple imputation…

Machine Learning · Computer Science 2021-12-23 Zongyu Dai , Zhiqi Bu , Qi Long

Missing data are ubiquitous in real world applications and, if not adequately handled, may lead to the loss of information and biased findings in downstream analysis. Particularly, high-dimensional incomplete data with a moderate sample…

Machine Learning · Computer Science 2022-12-23 Zongyu Dai , Zhiqi Bu , Qi Long

\Multiple imputation (MI) is a popular and well-established method for handling missing data in multivariate data sets, but its practicality for use in massive and complex data sets has been questioned. One such data set is the Panel Study…

Missing data represents a fundamental challenge in machine learning applications, often reducing model performance and reliability. This problem is particularly acute in fields like bioinformatics and clinical machine learning, where…

Machine Learning · Computer Science 2025-09-04 Fatemeh Azad , Zoran Bosnić , Matjaž Kukar

Missing data present challenges in data analysis. Naive analyses such as complete-case and available-case analysis may introduce bias and loss of efficiency, and produce unreliable results. Multiple imputation (MI) is one of the most widely…

Methodology · Statistics 2019-05-15 Domonique W. Hodge , Sandra E. Safo , Qi Long

Current implementations of Gradient Boosting Machines are mostly designed for single-target regression tasks and commonly assume independence between responses when used in multivariate settings. As such, these models are not well suited if…

Machine Learning · Computer Science 2022-10-14 Alexander März

Gradient boosted trees are competition-winning, general-purpose, non-parametric regressors, which exploit sequential model fitting and gradient descent to minimize a specific loss function. The most popular implementations are tailored to…

Machine Learning · Computer Science 2022-08-23 Lorenzo Nespoli , Vasco Medici

This paper compares the performance of various data processing methods in terms of predictive performance for structured data. This paper also seeks to identify and recommend preprocessing methodologies for tree-based binary classification…

Methodology · Statistics 2023-02-27 Tosan Johnson , Alice J. Liu , Syed Raza , Aaron McGuire

Boosting is an ensemble method that combines base models in a sequential manner to achieve high predictive accuracy. A popular learning algorithm based on this ensemble method is eXtreme Gradient Boosting (XGB). We present an adaptation of…

Machine Learning · Computer Science 2020-05-18 Jacob Montiel , Rory Mitchell , Eibe Frank , Bernhard Pfahringer , Talel Abdessalem , Albert Bifet

Global optimization of decision trees is a long-standing challenge in combinatorial optimization, yet such models play an important role in interpretable machine learning. Although the problem has been investigated for several decades, only…

Machine Learning · Computer Science 2026-02-03 Jiancheng Tu , Wenqi Fan , Zhibin Wu

Multiple imputation provides us with efficient estimators in model-based methods for handling missing data under the true model. It is also well-understood that design-based estimators are robust methods that do not require accurately…

Methodology · Statistics 2020-06-11 Kyunghee Han , Pamela A. Shaw , Thomas Lumley
‹ Prev 1 2 3 10 Next ›