Related papers: Adaptive Optimization for Prediction with Missing …

Learning Data-Driven Uncertainty Set Partitions for Robust and Adaptive Energy Forecasting with Missing Data

Short-term forecasting models typically assume the availability of input data (features) when they are deployed and in use. However, equipment failures, disruptions, cyberattacks, may lead to missing features when such models are used…

Machine Learning · Statistics 2025-06-30 Akylas Stratigakos , Panagiotis Andrianesis

What's a good imputation to predict with missing values?

How to learn a good predictor on data with missing values? Most efforts focus on first imputing as well as possible and second learning on the completed data to predict the outcome. Yet, this widespread practice has no theoretical…

Machine Learning · Statistics 2021-12-01 Marine Le Morvan , Julie Josse , Erwan Scornet , Gaël Varoquaux

Modular Regression: Improving Linear Models by Incorporating Auxiliary Data

This paper develops a new framework, called modular regression, to utilize auxiliary information -- such as variables other than the original features or additional data sets -- in the training process of linear models. At a high level, our…

Methodology · Statistics 2023-11-27 Ying Jin , Dominik Rothenhäusler

Missing Data Imputation for Supervised Learning

Missing data imputation can help improve the performance of prediction models in situations where missing data hide useful information. This paper compares methods for imputing missing categorical data for supervised classification tasks.…

Machine Learning · Statistics 2020-08-11 Jason Poulos , Rafael Valle

High-dimensional Linear Discriminant Analysis: Optimality, Adaptive Algorithm, and Missing Data

This paper aims to develop an optimality theory for linear discriminant analysis in the high-dimensional setting. A data-driven and tuning free classification rule, which is based on an adaptive constrained $\ell_1$ minimization approach,…

Methodology · Statistics 2018-04-10 T. Tony Cai , Linjun Zhang

A primer on linear classification with missing data

Supervised learning with missing data aims at building the best prediction of a target output based on partially-observed inputs. Major approaches to address this problem can be decomposed into $(i)$ impute-then-predict strategies, which…

Statistics Theory · Mathematics 2024-10-14 Angel D Reyero Lobo , Alexis Ayme , Claire Boyer , Erwan Scornet

An Investigation of Methods for Handling Missing Data with Penalized Regression

We investigate methods for penalized regression in the presence of missing observations. This paper introduces a method for estimating the parameters which compensates for the missing observations. We first, derive an unbiased estimator of…

Applications · Statistics 2013-10-09 Yunjin Choi , Robert Tibshirani

RIGID: Robust Linear Regression with Missing Data

We present a robust framework to perform linear regression with missing entries in the features. By considering an elliptical data distribution, and specifically a multivariate normal model, we are able to conditionally formulate a…

Machine Learning · Computer Science 2022-11-10 Alireza Aghasi , MohammadJavad Feizollahi , Saeed Ghadimi

On the consistency of supervised learning with missing values

In many application settings, the data have missing entries which make analysis challenging. An abundant literature addresses missing values in an inferential framework: estimating parameters and their variance from incomplete tables. Here,…

Machine Learning · Statistics 2024-03-22 Julie Josse , Jacob M. Chen , Nicolas Prost , Erwan Scornet , Gaël Varoquaux

MAIN: Multihead-Attention Imputation Networks

The problem of missing data, usually absent incurated and competition-standard datasets, is an unfortunate reality for most machine learning models used in industry applications. Recent work has focused on understanding the nature and the…

Machine Learning · Computer Science 2022-01-25 Spyridon Mouselinos , Kyriakos Polymenakos , Antonis Nikitakis , Konstantinos Kyriakopoulos

Machine learning with incomplete datasets using multi-objective optimization models

Machine learning techniques have been developed to learn from complete data. When missing values exist in a dataset, the incomplete data should be preprocessed separately by removing data points with missing values or imputation. In this…

Machine Learning · Computer Science 2020-12-25 Hadi A. Khorshidi , Michael Kirley , Uwe Aickelin

Probabilistic wind power forecasting resilient to missing values: an adaptive quantile regression approach

Probabilistic wind power forecasting approaches have significantly advanced in recent decades. However, forecasters often assume data completeness and overlook the challenge of missing values resulting from sensor failures, network…

Applications · Statistics 2024-04-25 Honglin Wen

On the Relation between Prediction and Imputation Accuracy under Missing Covariates

Missing covariates in regression or classification problems can prohibit the direct use of advanced tools for further analysis. Recent research has realized an increasing trend towards the usage of modern Machine Learning algorithms for…

Machine Learning · Statistics 2022-03-23 Burim Ramosaj , Justus Tulowietzki , Markus Pauly

Sparse Linear Regression With Missing Data

This paper proposes a fast and accurate method for sparse regression in the presence of missing data. The underlying statistical model encapsulates the low-dimensional structure of the incomplete data matrix and the sparsity of the…

Machine Learning · Statistics 2015-03-31 Ravi Ganti , Rebecca M. Willett

Adaptive Bayesian Linear Regression for Automated Machine Learning

To solve a machine learning problem, one typically needs to perform data preprocessing, modeling, and hyperparameter tuning, which is known as model selection and hyperparameter optimization.The goal of automated machine learning (AutoML)…

Machine Learning · Computer Science 2019-04-19 Weilin Zhou , Frederic Precioso

On regression and classification with possibly missing response variables in the data

This paper considers the problem of kernel regression and classification with possibly unobservable response variables in the data, where the mechanism that causes the absence of information is unknown and can depend on both predictors and…

Statistics Theory · Mathematics 2022-12-07 Majid Mojirsheibani , William Pouliot , Andre Shakhbandaryan

Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment

In most machine learning training paradigms a fixed, often handcrafted, loss function is assumed to be a good proxy for an underlying evaluation metric. In this work we assess this assumption by meta-learning an adaptive loss function to…

Machine Learning · Computer Science 2019-05-16 Chen Huang , Shuangfei Zhai , Walter Talbott , Miguel Angel Bautista , Shih-Yu Sun , Carlos Guestrin , Josh Susskind

Predict-Then-Optimize by Proxy: Learning Joint Models of Prediction and Optimization

Many real-world decision processes are modeled by optimization problems whose defining parameters are unknown and must be inferred from observable data. The Predict-Then-Optimize framework uses machine learning models to predict unknown…

Machine Learning · Computer Science 2023-11-23 James Kotary , Vincenzo Di Vito , Jacob Christopher , Pascal Van Hentenryck , Ferdinando Fioretto

When Pattern-by-Pattern Works: Theoretical and Empirical Insights for Logistic Models with Missing Values

Predicting with missing inputs challenges even parametric models, as parameter estimation alone is insufficient for prediction on incomplete data. While several works study prediction in linear models, we focus on logistic models, where…

Machine Learning · Statistics 2026-02-03 Christophe Muller , Erwan Scornet , Julie Josse

Coupled Training with Privileged Information and Unlabeled Data

In many prediction problems, we have extra information during training (for example, measurements that are expensive or slow to collect) that will not be available when the model is deployed. A common strategy is to first train a model that…

Machine Learning · Statistics 2026-05-25 Jiahao Shi , Omar Hagrass , Jason M. Klusowski