Related papers: Prediction Models That Learn to Avoid Missing Valu…

On the consistency of supervised learning with missing values

In many application settings, the data have missing entries which make analysis challenging. An abundant literature addresses missing values in an inferential framework: estimating parameters and their variance from incomplete tables. Here,…

Machine Learning · Statistics 2024-03-22 Julie Josse , Jacob M. Chen , Nicolas Prost , Erwan Scornet , Gaël Varoquaux

Fairness without Imputation: A Decision Tree Approach for Fair Prediction with Missing Values

We investigate the fairness concerns of training a machine learning model using data with missing values. Even though there are a number of fairness intervention methods in the literature, most of them require a complete training set as…

Machine Learning · Computer Science 2022-04-15 Haewon Jeong , Hao Wang , Flavio P. Calmon

Impact of Missing Values in Machine Learning: A Comprehensive Analysis

Machine learning (ML) has become a ubiquitous tool across various domains of data mining and big data analysis. The efficacy of ML models depends heavily on high-quality datasets, which are often complicated by the presence of missing…

Machine Learning · Computer Science 2024-10-14 Abu Fuad Ahmad , Md Shohel Sayeed , Khaznah Alshammari , Istiaque Ahmed

Handling Missing Data in Decision Trees: A Probabilistic Approach

Decision trees are a popular family of models due to their attractive properties such as interpretability and ability to handle heterogeneous data. Concurrently, missing data is a prevalent occurrence that hinders performance of machine…

Machine Learning · Computer Science 2020-07-01 Pasha Khosravi , Antonio Vergari , YooJung Choi , Yitao Liang , Guy Van den Broeck

Leveraging Predictive Equivalence in Decision Trees

Decision trees are widely used for interpretable machine learning due to their clearly structured reasoning process. However, this structure belies a challenge we refer to as predictive equivalence: a given tree's decision boundary can be…

Machine Learning · Computer Science 2025-10-15 Hayden McTavish , Zachery Boner , Jon Donnelly , Margo Seltzer , Cynthia Rudin

PROMISSING: Pruning Missing Values in Neural Networks

While data are the primary fuel for machine learning models, they often suffer from missing values, especially when collected in real-world scenarios. However, many off-the-shelf machine learning models, including artificial neural network…

Machine Learning · Computer Science 2022-06-06 Seyed Mostafa Kia , Nastaran Mohammadian Rad , Daniel van Opstal , Bart van Schie , Andre F. Marquand , Josien Pluim , Wiepke Cahn , Hugo G. Schnack

Prediction with Missing Data: Target Probabilities and Missingness Mechanisms

Conditions ensuring optimal parameter estimation in the presence of missing data are well established in inference, typically relying on the Missing-at-Random (MAR) assumption. In prediction, similar principles are often assumed to apply.…

Methodology · Statistics 2026-03-19 Pierre Catoire , Robin Genuer , Cecile Proust-Lima

Sharing pattern submodels for prediction with missing values

Missing values are unavoidable in many applications of machine learning and present challenges both during training and at test time. When variables are missing in recurring patterns, fitting separate pattern submodels have been proposed as…

Machine Learning · Computer Science 2023-11-27 Lena Stempfle , Ashkan Panahi , Fredrik D. Johansson

Benchmarking missing-values approaches for predictive models on health databases

BACKGROUND: As databases grow larger, it becomes harder to fully control their collection, and they frequently come with missing values: incomplete observations. These large databases are well suited to train machine-learning models, for…

Machine Learning · Computer Science 2022-02-23 Alexandre Perez-Lebel , Gaël Varoquaux , Marine Le Morvan , Julie Josse , Jean-Baptiste Poline

Prediction with Missing Data via Bayesian Additive Regression Trees

We present a method for incorporating missing data in non-parametric statistical learning without the need for imputation. We focus on a tree-based method, Bayesian Additive Regression Trees (BART), enhanced with "Missingness Incorporated…

Machine Learning · Statistics 2014-02-14 Adam Kapelner , Justin Bleich

Missing Values Handling for Machine Learning Portfolios

We characterize the structure and origins of missingness for 159 cross-sectional return predictors and study missing value handling for portfolios constructed using machine learning. Simply imputing with cross-sectional means performs well…

Methodology · Statistics 2024-01-15 Andrew Y. Chen , Jack McCoy

The Importance of Modeling Data Missingness in Algorithmic Fairness: A Causal Perspective

Training datasets for machine learning often have some form of missingness. For example, to learn a model for deciding whom to give a loan, the available training data includes individuals who were given a loan in the past, but not those…

Machine Learning · Computer Science 2020-12-22 Naman Goel , Alfonso Amayuelas , Amit Deshpande , Amit Sharma

Learning from data with structured missingness

Missing data are an unavoidable complication in many machine learning tasks. When data are `missing at random' there exist a range of tools and techniques to deal with the issue. However, as machine learning studies become more ambitious,…

Machine Learning · Statistics 2023-04-05 Robin Mitra , Sarah F. McGough , Tapabrata Chakraborti , Chris Holmes , Ryan Copping , Niels Hagenbuch , Stefanie Biedermann , Jack Noonan , Brieuc Lehmann , Aditi Shenvi , Xuan Vinh Doan , David Leslie , Ginestra Bianconi , Ruben Sanchez-Garcia , Alisha Davies , Maxine Mackintosh , Eleni-Rosalina Andrinopoulou , Anahid Basiri , Chris Harbron , Ben D. MacArthur

LIFE: Learning Individual Features for Multivariate Time Series Prediction with Missing Values

Multivariate time series (MTS) prediction is ubiquitous in real-world fields, but MTS data often contains missing values. In recent years, there has been an increasing interest in using end-to-end models to handle MTS with missing values.…

Machine Learning · Computer Science 2023-05-11 Zhao-Yu Zhang , Shao-Qun Zhang , Yuan Jiang , Zhi-Hua Zhou

BEST : A decision tree algorithm that handles missing values

The main contribution of this paper is the development of a new decision tree algorithm. The proposed approach allows users to guide the algorithm through the data partitioning process. We believe this feature has many applications but in…

Machine Learning · Statistics 2020-10-27 Cédric Beaulac , Jeffrey S. Rosenthal

Evaluation Gaps in Machine Learning Practice

Forming a reliable judgement of a machine learning (ML) model's appropriateness for an application ecosystem is critical for its responsible use, and requires considering a broad range of factors including harms, benefits, and…

Machine Learning · Computer Science 2022-05-12 Ben Hutchinson , Negar Rostamzadeh , Christina Greer , Katherine Heller , Vinodkumar Prabhakaran

GeoMAE: Masking Representation Learning for Spatio-Temporal Graph Forecasting with Missing Values

The ubiquity of missing data in urban intelligence systems, attributable to adverse environmental conditions and equipment failures, poses a significant challenge to the efficacy of downstream applications, notably in the realms of traffic…

Machine Learning · Computer Science 2026-05-25 Songyu Ke , Chenyu Wu , Yuxuan Liang , Huiling Qin , Junbo Zhang , Yu Zheng

From Learning to Meta-Learning: Reduced Training Overhead and Complexity for Communication Systems

Machine learning methods adapt the parameters of a model, constrained to lie in a given model class, by using a fixed learning procedure based on data or active observations. Adaptation is done on a per-task basis, and retraining is needed…

Machine Learning · Computer Science 2021-10-22 Osvaldo Simeone , Sangwoo Park , Joonhyuk Kang

Multi-environment Invariance Learning with Missing Data

Learning models that can handle distribution shifts is a key challenge in domain generalization. Invariance learning, an approach that focuses on identifying features invariant across environments, improves model generalization by capturing…

Machine Learning · Statistics 2026-05-11 Yiran Jia , Jelena Bradic

MINTY: Rule-based Models that Minimize the Need for Imputing Features with Missing Values

Rule models are often preferred in prediction tasks with tabular inputs as they can be easily interpreted using natural language and provide predictive performance on par with more complex models. However, most rule models' predictions are…

Machine Learning · Computer Science 2023-11-27 Lena Stempfle , Fredrik D. Johansson