Related papers: Does imputation matter? Benchmark for predictive m…
Missing values are prevalent across various fields, posing challenges for training and deploying predictive models. In this context, imputation is a common practice, driven by the hope that accurate imputations will enhance predictions.…
Classifying samples in incomplete datasets is a common aim for machine learning practitioners, but is non-trivial. Missing data is found in most real-world datasets and these missing values are typically imputed using established methods,…
Decision making from data involves identifying a set of attributes that contribute to effective decision making through computational intelligence. The presence of missing values greatly influences the selection of right set of attributes…
Missing values pose a persistent challenge in modern data science. Consequently, there is an ever-growing number of publications introducing new imputation methods in various fields. While many studies compare imputation approaches, they…
Recent advances in probabilistic modelling have led to a large number of simulation-based inference algorithms which do not require numerical evaluation of likelihoods. However, a public benchmark with appropriate performance metrics for…
Nonparametric and machine learning methods are flexible methods for obtaining accurate predictions. Nowadays, data sets with a large number of predictors and complex structures are fairly common. In the presence of item nonresponse,…
Missing data imputation can help improve the performance of prediction models in situations where missing data hide useful information. This paper compares methods for imputing missing categorical data for supervised classification tasks.…
By filling in missing values in datasets, imputation allows these datasets to be used with algorithms that cannot handle missing values by themselves. However, missing values may in principle contribute useful information that is lost…
Missing covariates in regression or classification problems can prohibit the direct use of advanced tools for further analysis. Recent research has realized an increasing trend towards the usage of modern Machine Learning algorithms for…
Imputation is an attractive tool for dealing with the widespread issue of missing values. Consequently, studying and developing imputation methods has been an active field of research over the last decade. Faced with an imputation task and…
Clinical researchers often select among and evaluate risk prediction models using standard machine learning metrics based on confusion matrices. However, if these models are used to allocate interventions to patients, standard metrics…
Many datasets suffer from missing values due to various reasons,which not only increases the processing difficulty of related tasks but also reduces the accuracy of classification. To address this problem, the mainstream approach is to use…
BACKGROUND: As databases grow larger, it becomes harder to fully control their collection, and they frequently come with missing values: incomplete observations. These large databases are well suited to train machine-learning models, for…
Data imputation, the process of filling in missing feature elements for incomplete data sets, plays a crucial role in data-driven learning. A fundamental belief is that data imputation is helpful for learning performance, and it follows…
Real-world data is often incomplete and contains missing values. To train accurate models over real-world datasets, users need to spend a substantial amount of time and resources imputing and finding proper values for missing data items. In…
Predictive benchmarking, the evaluation of machine learning models based on predictive performance and competitive ranking, is a central epistemic practice in machine learning research and an increasingly prominent method for scientific…
The efficacy of machine learning (ML) models depends on both algorithms and data. Training data defines what we want our models to learn, and testing data provides the means by which their empirical progress is measured. Benchmark datasets…
Large language model (LLM) evaluation is increasingly costly, prompting interest in methods that speed up evaluation by shrinking benchmark datasets. Benchmark prediction (also called efficient LLM evaluation) aims to select a small subset…
Machine learning models are often used to inform real world risk assessment tasks: predicting consumer default risk, predicting whether a person suffers from a serious illness, or predicting a person's risk to appear in court. Given…
Causal inference analysis is the estimation of the effects of actions on outcomes. In the context of healthcare data this means estimating the outcome of counter-factual treatments (i.e. including treatments that were not observed) on a…