Related papers: Does imputation matter? Benchmark for predictive m…

Imputation for prediction: beware of diminishing returns

Missing values are prevalent across various fields, posing challenges for training and deploying predictive models. In this context, imputation is a common practice, driven by the hope that accurate imputations will enhance predictions.…

Artificial Intelligence · Computer Science 2025-02-21 Marine Le Morvan , Gaël Varoquaux

Classification of datasets with imputed missing values: does imputation quality matter?

Classifying samples in incomplete datasets is a common aim for machine learning practitioners, but is non-trivial. Missing data is found in most real-world datasets and these missing values are typically imputed using established methods,…

Machine Learning · Computer Science 2023-12-20 Tolou Shadbahr , Michael Roberts , Jan Stanczuk , Julian Gilbey , Philip Teare , Sören Dittmer , Matthew Thorpe , Ramon Vinas Torne , Evis Sala , Pietro Lio , Mishal Patel , AIX-COVNET Collaboration , James H. F. Rudd , Tuomas Mirtti , Antti Rannikko , John A. D. Aston , Jing Tang , Carola-Bibiane Schönlieb

Performance comparison of State-of-the-art Missing Value Imputation Algorithms on Some Bench mark Datasets

Decision making from data involves identifying a set of attributes that contribute to effective decision making through computational intelligence. The presence of missing values greatly influences the selection of right set of attributes…

Machine Learning · Computer Science 2013-07-23 M. Naresh Kumar

Do we Need Dozens of Methods for Real World Missing Value Imputation?

Missing values pose a persistent challenge in modern data science. Consequently, there is an ever-growing number of publications introducing new imputation methods in various fields. While many studies compare imputation approaches, they…

Computation · Statistics 2025-11-10 Krystyna Grzesiak , Christophe Muller , Julie Josse , Jeffrey Näf

Benchmarking Simulation-Based Inference

Recent advances in probabilistic modelling have led to a large number of simulation-based inference algorithms which do not require numerical evaluation of likelihoods. However, a public benchmark with appropriate performance metrics for…

Machine Learning · Statistics 2021-04-12 Jan-Matthis Lueckmann , Jan Boelts , David S. Greenberg , Pedro J. Gonçalves , Jakob H. Macke

Imputation procedures in surveys using nonparametric and machine learning methods: an empirical comparison

Nonparametric and machine learning methods are flexible methods for obtaining accurate predictions. Nowadays, data sets with a large number of predictors and complex structures are fairly common. In the presence of item nonresponse,…

Methodology · Statistics 2022-08-23 Mehdi Dagdoug , Camelia Goga , David Haziza

Missing Data Imputation for Supervised Learning

Missing data imputation can help improve the performance of prediction models in situations where missing data hide useful information. This paper compares methods for imputing missing categorical data for supervised classification tasks.…

Machine Learning · Statistics 2020-08-11 Jason Poulos , Rafael Valle

No imputation without representation

By filling in missing values in datasets, imputation allows these datasets to be used with algorithms that cannot handle missing values by themselves. However, missing values may in principle contribute useful information that is lost…

Machine Learning · Computer Science 2024-10-31 Oliver Urs Lenz , Daniel Peralta , Chris Cornelis

On the Relation between Prediction and Imputation Accuracy under Missing Covariates

Missing covariates in regression or classification problems can prohibit the direct use of advanced tools for further analysis. Recent research has realized an increasing trend towards the usage of modern Machine Learning algorithms for…

Machine Learning · Statistics 2022-03-23 Burim Ramosaj , Justus Tulowietzki , Markus Pauly

How to rank imputation methods?

Imputation is an attractive tool for dealing with the widespread issue of missing values. Consequently, studying and developing imputation methods has been an active field of research over the last decade. Faced with an imputation task and…

Methodology · Statistics 2025-07-16 Jeffrey Näf , Krystyna Grzesiak , Erwan Scornet

Performance metrics for intervention-triggering prediction models do not reflect an expected reduction in outcomes from using the model

Clinical researchers often select among and evaluate risk prediction models using standard machine learning metrics based on confusion matrices. However, if these models are used to allocate interventions to patients, standard metrics…

Machine Learning · Statistics 2020-06-03 Alejandro Schuler , Aashish Bhardwaj , Vincent Liu

Iterative missing value imputation based on feature importance

Many datasets suffer from missing values due to various reasons,which not only increases the processing difficulty of related tasks but also reduces the accuracy of classification. To address this problem, the mainstream approach is to use…

Machine Learning · Computer Science 2024-08-14 Cong Guo , Chun Liu , Wei Yang

Benchmarking missing-values approaches for predictive models on health databases

BACKGROUND: As databases grow larger, it becomes harder to fully control their collection, and they frequently come with missing values: incomplete observations. These large databases are well suited to train machine-learning models, for…

Machine Learning · Computer Science 2022-02-23 Alexandre Perez-Lebel , Gaël Varoquaux , Marine Le Morvan , Julie Josse , Jean-Baptiste Poline

Data Imputation by Pursuing Better Classification: A Supervised Kernel-Based Method

Data imputation, the process of filling in missing feature elements for incomplete data sets, plays a crucial role in data-driven learning. A fundamental belief is that data imputation is helpful for learning performance, and it follows…

Machine Learning · Computer Science 2025-09-30 Ruikai Yang , Fan He , Mingzhen He , Kaijie Wang , Xiaolin Huang

Certain and Approximately Certain Models for Statistical Learning

Real-world data is often incomplete and contains missing values. To train accurate models over real-world datasets, users need to spend a substantial amount of time and resources imputing and finding proper values for missing data items. In…

Machine Learning · Statistics 2024-03-05 Cheng Zhen , Nischal Aryal , Arash Termehchy , Alireza Aghasi , Amandeep Singh Chabada

The Benchmarking Epistemology: Construct Validity for Evaluating Machine Learning Models

Predictive benchmarking, the evaluation of machine learning models based on predictive performance and competitive ranking, is a central epistemic practice in machine learning research and an increasingly prominent method for scientific…

Machine Learning · Computer Science 2025-10-28 Timo Freiesleben , Sebastian Zezulka

Data Excellence for AI: Why Should You Care

The efficacy of machine learning (ML) models depends on both algorithms and data. Training data defines what we want our models to learn, and testing data provides the means by which their empirical progress is measured. Benchmark datasets…

Machine Learning · Computer Science 2021-11-23 Lora Aroyo , Matthew Lease , Praveen Paritosh , Mike Schaekermann

How Benchmark Prediction from Fewer Data Misses the Mark

Large language model (LLM) evaluation is increasingly costly, prompting interest in methods that speed up evaluation by shrinking benchmark datasets. Benchmark prediction (also called efficient LLM evaluation) aims to select a small subset…

Machine Learning · Computer Science 2025-06-10 Guanhua Zhang , Florian E. Dorner , Moritz Hardt

Predictive Multiplicity in Probabilistic Classification

Machine learning models are often used to inform real world risk assessment tasks: predicting consumer default risk, predicting whether a person suffers from a serious illness, or predicting a person's risk to appear in court. Given…

Machine Learning · Computer Science 2023-06-27 Jamelle Watson-Daniels , David C. Parkes , Berk Ustun

Benchmarking Framework for Performance-Evaluation of Causal Inference Analysis

Causal inference analysis is the estimation of the effects of actions on outcomes. In the context of healthcare data this means estimating the outcome of counter-factual treatments (i.e. including treatments that were not observed) on a…

Methodology · Statistics 2018-03-21 Yishai Shimoni , Chen Yanover , Ehud Karavani , Yaara Goldschmnidt