Related papers: Predicting missing values: A good idea?
Predictive mean matching (PMM) is a popular imputation strategy that imputes missing values by borrowing observed values from other cases with similar expectations. We show that, unlike other imputation strategies, PMM is not guaranteed to…
In many application settings, the data have missing entries which make analysis challenging. An abundant literature addresses missing values in an inferential framework: estimating parameters and their variance from incomplete tables. Here,…
In this paper, prediction for linear systems with missing information is investigated. New methods are introduced to improve the Mean Squared Error (MSE) on the test set in comparison to state-of-the-art methods, through appropriate tuning…
Bagging can significantly improve the generalization performance of unstable machine learning algorithms such as trees or neural networks. Though bagging is now widely used in practice and many empirical studies have explored its behavior,…
Missing values or data is one popular characteristic of real-world datasets, especially healthcare data. This could be frustrating when using machine learning algorithms on such datasets, simply because most machine learning models perform…
Sparse coding refers to the pursuit of the sparsest representation of a signal in a typically overcomplete dictionary. From a Bayesian perspective, sparse coding provides a Maximum a Posteriori (MAP) estimate of the unknown vector under a…
This paper proposes an estimation framework to assess the performance of sorting over perturbed/noisy data. In particular, the recovering accuracy is measured in terms of Minimum Mean Square Error (MMSE) between the values of the sorting…
Missing values pose a persistent challenge in modern data science. Consequently, there is an ever-growing number of publications introducing new imputation methods in various fields. While many studies compare imputation approaches, they…
Combining machine learning with econometric analysis is becoming increasingly prevalent in both research and practice. A common empirical strategy involves the application of predictive modeling techniques to 'mine' variables of interest…
We study a seemingly unexpected and relatively less understood overfitting aspect of a fundamental tool in sparse linear modeling - best subset selection, which minimizes the residual sum of squares subject to a constraint on the number of…
This dissertation shows that careful injection of noise into sample data can substantially speed up Expectation-Maximization algorithms. Expectation-Maximization algorithms are a class of iterative algorithms for extracting maximum…
Consider the minimum mean-square error (MMSE) of estimating an arbitrary random variable from its observation contaminated by Gaussian noise. The MMSE can be regarded as a function of the signal-to-noise ratio (SNR) as well as a functional…
We consider continuous-time sparse stochastic processes from which we have only a finite number of noisy/noiseless samples. Our goal is to estimate the noiseless samples (denoising) and the signal in-between (interpolation problem). By…
This work proposes a machine-learning framework for constructing statistical models of errors incurred by approximate solutions to parameterized systems of nonlinear equations. These approximate solutions may arise from early termination of…
Missing data are ubiquitous in empirical databases, yet statistical analyses typically require complete data matrices. Multiple imputation offers a principled solution for filling these gaps. This study evaluates the performance of several…
Minimum mean squared error (MMSE) estimators of signals from samples corrupted by jitter (timing noise) and additive noise are nonlinear, even when the signal prior and additive noise have normal distributions. This paper develops a…
Data corruption, including missing and noisy data, poses significant challenges in real-world machine learning. This study investigates the effects of data corruption on model performance and explores strategies to mitigate these effects…
Prediction Rule Ensembles (PREs) are robust and interpretable statistical learning techniques with potential for predictive analytics, yet their efficacy in the presence of missing data is untested. This study uses multiple imputation to…
Missing data imputation is an important research topic in data mining. Large-scale Molecular descriptor data may contains missing values (MVs). However, some methods for downstream analyses, including some prediction tools, require a…
Bias in predictive machine learning (ML) models is a fundamental challenge due to the skewed or unfair outcomes produced by biased models. Existing mitigation strategies rely on either post-hoc corrections or rigid constraints. However,…