English
Related papers

Related papers: Predicting missing values: A good idea?

200 papers

Predictive mean matching (PMM) is a popular imputation strategy that imputes missing values by borrowing observed values from other cases with similar expectations. We show that, unlike other imputation strategies, PMM is not guaranteed to…

Methodology · Statistics 2025-07-01 Paul T. von Hippel

In many application settings, the data have missing entries which make analysis challenging. An abundant literature addresses missing values in an inferential framework: estimating parameters and their variance from incomplete tables. Here,…

Machine Learning · Statistics 2024-03-22 Julie Josse , Jacob M. Chen , Nicolas Prost , Erwan Scornet , Gaël Varoquaux

In this paper, prediction for linear systems with missing information is investigated. New methods are introduced to improve the Mean Squared Error (MSE) on the test set in comparison to state-of-the-art methods, through appropriate tuning…

Machine Learning · Statistics 2017-01-04 Mohammad Amin Fakharian , Ashkan Esmaeili , Farokh Marvasti

Bagging can significantly improve the generalization performance of unstable machine learning algorithms such as trees or neural networks. Though bagging is now widely used in practice and many empirical studies have explored its behavior,…

Machine Learning · Computer Science 2019-08-08 Martin Mihelich , Charles Dognin , Yan Shu , Michael Blot

Missing values or data is one popular characteristic of real-world datasets, especially healthcare data. This could be frustrating when using machine learning algorithms on such datasets, simply because most machine learning models perform…

Machine Learning · Computer Science 2024-03-25 Luke Oluwaseye Joel , Wesley Doorsamy , Babu Sena Paul

Sparse coding refers to the pursuit of the sparsest representation of a signal in a typically overcomplete dictionary. From a Bayesian perspective, sparse coding provides a Maximum a Posteriori (MAP) estimate of the unknown vector under a…

Signal Processing · Electrical Eng. & Systems 2019-09-04 Dror Simon , Jeremias Sulam , Yaniv Romano , Yue M. Lu , Michael Elad

This paper proposes an estimation framework to assess the performance of sorting over perturbed/noisy data. In particular, the recovering accuracy is measured in terms of Minimum Mean Square Error (MMSE) between the values of the sorting…

Information Theory · Computer Science 2019-09-04 Alex Dytso , Martina Cardone , H. Vincent Poor

Missing values pose a persistent challenge in modern data science. Consequently, there is an ever-growing number of publications introducing new imputation methods in various fields. While many studies compare imputation approaches, they…

Computation · Statistics 2025-11-10 Krystyna Grzesiak , Christophe Muller , Julie Josse , Jeffrey Näf

Combining machine learning with econometric analysis is becoming increasingly prevalent in both research and practice. A common empirical strategy involves the application of predictive modeling techniques to 'mine' variables of interest…

Econometrics · Economics 2020-12-22 Mochen Yang , Edward McFowland , Gordon Burtch , Gediminas Adomavicius

We study a seemingly unexpected and relatively less understood overfitting aspect of a fundamental tool in sparse linear modeling - best subset selection, which minimizes the residual sum of squares subject to a constraint on the number of…

Methodology · Statistics 2022-01-11 Rahul Mazumder , Peter Radchenko , Antoine Dedieu

This dissertation shows that careful injection of noise into sample data can substantially speed up Expectation-Maximization algorithms. Expectation-Maximization algorithms are a class of iterative algorithms for extracting maximum…

Machine Learning · Statistics 2014-11-26 Osonde Adekorede Osoba

Consider the minimum mean-square error (MMSE) of estimating an arbitrary random variable from its observation contaminated by Gaussian noise. The MMSE can be regarded as a function of the signal-to-noise ratio (SNR) as well as a functional…

Information Theory · Computer Science 2010-04-21 Dongning Guo , Yihong Wu , Shlomo Shamai , Sergio Verdu

We consider continuous-time sparse stochastic processes from which we have only a finite number of noisy/noiseless samples. Our goal is to estimate the noiseless samples (denoising) and the signal in-between (interpolation problem). By…

Machine Learning · Computer Science 2015-06-11 Arash Amini , Ulugbek S. Kamilov , Emrah Bostan , Michael Unser

This work proposes a machine-learning framework for constructing statistical models of errors incurred by approximate solutions to parameterized systems of nonlinear equations. These approximate solutions may arise from early termination of…

Numerical Analysis · Computer Science 2019-02-18 Brian A. Freno , Kevin T. Carlberg

Missing data are ubiquitous in empirical databases, yet statistical analyses typically require complete data matrices. Multiple imputation offers a principled solution for filling these gaps. This study evaluates the performance of several…

Computation · Statistics 2026-02-05 Enzo Porto Brasil

Minimum mean squared error (MMSE) estimators of signals from samples corrupted by jitter (timing noise) and additive noise are nonlinear, even when the signal prior and additive noise have normal distributions. This paper develops a…

Applications · Statistics 2015-03-24 Daniel S. Weller , Vivek K Goyal

Data corruption, including missing and noisy data, poses significant challenges in real-world machine learning. This study investigates the effects of data corruption on model performance and explores strategies to mitigate these effects…

Machine Learning · Computer Science 2025-05-22 Qi Liu , Wanjing Ma

Prediction Rule Ensembles (PREs) are robust and interpretable statistical learning techniques with potential for predictive analytics, yet their efficacy in the presence of missing data is untested. This study uses multiple imputation to…

Applications · Statistics 2024-10-22 Vincent Schroeder , Jakob Schwerter , Marjolein Fokkema , Philipp Doebler

Missing data imputation is an important research topic in data mining. Large-scale Molecular descriptor data may contains missing values (MVs). However, some methods for downstream analyses, including some prediction tools, require a…

Computational Engineering, Finance, and Science · Computer Science 2013-12-13 Doreswamy , Chanabasayya . M. Vastrad

Bias in predictive machine learning (ML) models is a fundamental challenge due to the skewed or unfair outcomes produced by biased models. Existing mitigation strategies rely on either post-hoc corrections or rigid constraints. However,…

Machine Learning · Computer Science 2025-07-01 Yash Vardhan Tomar
‹ Prev 1 2 3 10 Next ›