Related papers: Predicting missing values: A good idea?

Imputing With Predictive Mean Matching Can Be Severely Biased When Values Are Missing At Random

Predictive mean matching (PMM) is a popular imputation strategy that imputes missing values by borrowing observed values from other cases with similar expectations. We show that, unlike other imputation strategies, PMM is not guaranteed to…

Methodology · Statistics 2025-07-01 Paul T. von Hippel

On the consistency of supervised learning with missing values

In many application settings, the data have missing entries which make analysis challenging. An abundant literature addresses missing values in an inferential framework: estimating parameters and their variance from incomplete tables. Here,…

Machine Learning · Statistics 2024-03-22 Julie Josse , Jacob M. Chen , Nicolas Prost , Erwan Scornet , Gaël Varoquaux

New Methods of Enhancing Prediction Accuracy in Linear Models with Missing Data

In this paper, prediction for linear systems with missing information is investigated. New methods are introduced to improve the Mean Squared Error (MSE) on the test set in comparison to state-of-the-art methods, through appropriate tuning…

Machine Learning · Statistics 2017-01-04 Mohammad Amin Fakharian , Ashkan Esmaeili , Farokh Marvasti

A Characterization of Mean Squared Error for Estimator with Bagging

Bagging can significantly improve the generalization performance of unstable machine learning algorithms such as trees or neural networks. Though bagging is now widely used in practice and many empirical studies have explored its behavior,…

Machine Learning · Computer Science 2019-08-08 Martin Mihelich , Charles Dognin , Yan Shu , Michael Blot

On the Performance of Imputation Techniques for Missing Values on Healthcare Datasets

Missing values or data is one popular characteristic of real-world datasets, especially healthcare data. This could be frustrating when using machine learning algorithms on such datasets, simply because most machine learning models perform…

Machine Learning · Computer Science 2024-03-25 Luke Oluwaseye Joel , Wesley Doorsamy , Babu Sena Paul

MMSE Approximation For Sparse Coding Algorithms Using Stochastic Resonance

Sparse coding refers to the pursuit of the sparsest representation of a signal in a typically overcomplete dictionary. From a Bayesian perspective, sparse coding provides a Maximum a Posteriori (MAP) estimate of the unknown vector under a…

Signal Processing · Electrical Eng. & Systems 2019-09-04 Dror Simon , Jeremias Sulam , Yaniv Romano , Yue M. Lu , Michael Elad

Estimating Noisy Order Statistics

This paper proposes an estimation framework to assess the performance of sorting over perturbed/noisy data. In particular, the recovering accuracy is measured in terms of Minimum Mean Square Error (MMSE) between the values of the sorting…

Information Theory · Computer Science 2019-09-04 Alex Dytso , Martina Cardone , H. Vincent Poor

Do we Need Dozens of Methods for Real World Missing Value Imputation?

Missing values pose a persistent challenge in modern data science. Consequently, there is an ever-growing number of publications introducing new imputation methods in various fields. While many studies compare imputation approaches, they…

Computation · Statistics 2025-11-10 Krystyna Grzesiak , Christophe Muller , Julie Josse , Jeffrey Näf

Achieving Reliable Causal Inference with Data-Mined Variables: A Random Forest Approach to the Measurement Error Problem

Combining machine learning with econometric analysis is becoming increasingly prevalent in both research and practice. A common empirical strategy involves the application of predictive modeling techniques to 'mine' variables of interest…

Econometrics · Economics 2020-12-22 Mochen Yang , Edward McFowland , Gordon Burtch , Gediminas Adomavicius

Subset Selection with Shrinkage: Sparse Linear Modeling when the SNR is low

We study a seemingly unexpected and relatively less understood overfitting aspect of a fundamental tool in sparse linear modeling - best subset selection, which minimizes the residual sum of squares subject to a constraint on the number of…

Methodology · Statistics 2022-01-11 Rahul Mazumder , Peter Radchenko , Antoine Dedieu

Noise Benefits in Expectation-Maximization Algorithms

This dissertation shows that careful injection of noise into sample data can substantially speed up Expectation-Maximization algorithms. Expectation-Maximization algorithms are a class of iterative algorithms for extracting maximum…

Machine Learning · Statistics 2014-11-26 Osonde Adekorede Osoba

Estimation in Gaussian Noise: Properties of the Minimum Mean-Square Error

Consider the minimum mean-square error (MMSE) of estimating an arbitrary random variable from its observation contaminated by Gaussian noise. The MMSE can be regarded as a function of the signal-to-noise ratio (SNR) as well as a functional…

Information Theory · Computer Science 2010-04-21 Dongning Guo , Yihong Wu , Shlomo Shamai , Sergio Verdu

Bayesian Estimation for Continuous-Time Sparse Stochastic Processes

We consider continuous-time sparse stochastic processes from which we have only a finite number of noisy/noiseless samples. Our goal is to estimate the noiseless samples (denoising) and the signal in-between (interpolation problem). By…

Machine Learning · Computer Science 2015-06-11 Arash Amini , Ulugbek S. Kamilov , Emrah Bostan , Michael Unser

Machine-learning error models for approximate solutions to parameterized systems of nonlinear equations

This work proposes a machine-learning framework for constructing statistical models of errors incurred by approximate solutions to parameterized systems of nonlinear equations. These approximate solutions may arise from early termination of…

Numerical Analysis · Computer Science 2019-02-18 Brian A. Freno , Kevin T. Carlberg

Multiple Imputation Methods under Extreme Values

Missing data are ubiquitous in empirical databases, yet statistical analyses typically require complete data matrices. Multiple imputation offers a principled solution for filling these gaps. This study evaluates the performance of several…

Computation · Statistics 2026-02-05 Enzo Porto Brasil

Bayesian Post-Processing Methods for Jitter Mitigation in Sampling

Minimum mean squared error (MMSE) estimators of signals from samples corrupted by jitter (timing noise) and additive noise are nonlinear, even when the signal prior and additive noise have normal distributions. This paper develops a…

Applications · Statistics 2015-03-24 Daniel S. Weller , Vivek K Goyal

Navigating Data Corruption in Machine Learning: Balancing Quality, Quantity, and Imputation Strategies

Data corruption, including missing and noisy data, poses significant challenges in real-world machine learning. This study investigates the effects of data corruption on model performance and explores strategies to mitigate these effects…

Machine Learning · Computer Science 2025-05-22 Qi Liu , Wanjing Ma

Interpretable Prediction Rule Ensembles in the Presence of Missing Data

Prediction Rule Ensembles (PREs) are robust and interpretable statistical learning techniques with potential for predictive analytics, yet their efficacy in the presence of missing data is untested. This study uses multiple imputation to…

Applications · Statistics 2024-10-22 Vincent Schroeder , Jakob Schwerter , Marjolein Fokkema , Philipp Doebler

A Robust Missing Value Imputation Method MifImpute For Incomplete Molecular Descriptor Data And Comparative Analysis With Other Missing Value Imputation Methods

Missing data imputation is an important research topic in data mining. Large-scale Molecular descriptor data may contains missing values (MVs). However, some methods for downstream analyses, including some prediction tools, require a…

Computational Engineering, Finance, and Science · Computer Science 2013-12-13 Doreswamy , Chanabasayya . M. Vastrad

Feature-Wise Mixing for Mitigating Contextual Bias in Predictive Supervised Learning

Bias in predictive machine learning (ML) models is a fundamental challenge due to the skewed or unfair outcomes produced by biased models. Existing mitigation strategies rely on either post-hoc corrections or rigid constraints. However,…

Machine Learning · Computer Science 2025-07-01 Yash Vardhan Tomar