Related papers: Bootstrapping and Multiple Imputation Ensemble App…
Many modern estimators require bootstrapping to calculate confidence intervals because either no analytic standard error is available or the distribution of the parameter of interest is non-symmetric. It remains however unclear how to…
Missing data are ubiquitous in empirical databases, yet statistical analyses typically require complete data matrices. Multiple imputation offers a principled solution for filling these gaps. This study evaluates the performance of several…
Missing observations are common in cluster randomised trials. Approaches taken to handling such missing data include: complete case analysis, single-level multiple imputation that ignores the clustering, multiple imputation with a fixed…
Multiple imputation is a highly recommended technique to deal with missing data, but the application to longitudinal datasets can be done in multiple ways. When a new wave of longitudinal data arrives, we can treat the combined data of…
Missing data represents a fundamental challenge in machine learning applications, often reducing model performance and reliability. This problem is particularly acute in fields like bioinformatics and clinical machine learning, where…
Multiple imputation provides an effective way to handle missing data. When several possible models are under consideration for the data, the multiple imputation is typically performed under a single-best model selected from the candidate…
Standard approaches for variable selection in linear models are not tailored to deal properly with high-dimensional and incomplete data. Currently, methods dedicated to high-dimensional data handle missing values by ad-hoc strategies, like…
Methods to handle missing data have been extensively explored in the context of estimation and descriptive studies, with multiple imputation being the most widely used method in clinical research. However, in the context of clinical risk…
Multiple imputation has become one of the most popular approaches for handling missing data in statistical analyses. Part of this success is due to Rubin's simple combination rules. These give frequentist valid inferences when the…
Multiple imputation is widely used to handle missing data. Although Rubin's combining rule is simple, it is not clear whether or not the standard multiple imputation inference is consistent when coupled with the commonly-used full sample…
Gaussian Mixture models (GMMs) are a powerful tool for clustering, classification and density estimation when clustering structures are embedded in the data. The presence of missing values can largely impact the GMMs estimation process,…
Multiple imputation is a straightforward method for handling missing data in a principled fashion. This paper presents an overview of multiple imputation, including important theoretical results and their practical implications for…
Missing data is a pervasive issue in statistical analyses, affecting the reliability and validity of research across diverse scientific disciplines. Failure to adequately address missing data can lead to biased estimates and consequently…
Most practical data science problems encounter missing data. A wide variety of solutions exist, each with strengths and weaknesses that depend upon the missingness-generating process. Here we develop a theoretical framework for training and…
In machine learning, an agent needs to estimate uncertainty to efficiently explore and adapt and to make effective decisions. A common approach to uncertainty estimation maintains an ensemble of models. In recent years, several approaches…
By filling in missing values in datasets, imputation allows these datasets to be used with algorithms that cannot handle missing values by themselves. However, missing values may in principle contribute useful information that is lost…
Handling missing data is a central challenge in data-driven analysis. Modern imputation methods not only aim for accurate reconstruction but also differ in how they represent and quantify uncertainty. Yet, the reliability and calibration of…
This paper proposes a new non-parametric bootstrap method to quantify the uncertainty of average treatment effect estimate for the treated from matching estimators. More specifically, it seeks to quantify the uncertainty associated with the…
For multi-source data, blocks of variable information from certain sources are likely missing. Existing methods for handling missing data do not take structures of block-wise missing data into consideration. In this paper, we propose a…
The problem of missing data, usually absent incurated and competition-standard datasets, is an unfortunate reality for most machine learning models used in industry applications. Recent work has focused on understanding the nature and the…