Related papers: Partial Identification under Missing Data Using We…
We consider identification and estimation with an outcome missing not at random (MNAR). We study an identification strategy based on a so-called shadow variable. A shadow variable is assumed to be correlated with the outcome, but…
Missing data occur frequently in empirical studies in health and social sciences, often compromising our ability to make accurate inferences. An outcome is said to be missing not at random (MNAR) if, conditional on the observed variables,…
Missing data is a ubiquitous challenge in data analysis, often leading to biased and inaccurate results. Traditional imputation methods usually assume that the missingness mechanism is missing-at-random (MAR), where the missingness is…
When data contains measurement errors, it is necessary to make assumptions relating the observed, erroneous data to the unobserved true phenomena of interest. These assumptions should be justifiable on substantive grounds, but are often…
Conducting valid statistical analyses is challenging in the presence of missing-not-at-random (MNAR) data, where the missingness mechanism is dependent on the missing values themselves even conditioned on the observed data. Here, we…
Many important quantities of interest are only partially identified from observable data: the data can limit them to a set of plausible values, but not uniquely determine them. This paper develops a unified framework for covariate-assisted…
We consider the task of identifying and estimating a parameter of interest in settings where data is missing not at random (MNAR). In general, such parameters are not identified without strong assumptions on the missing data model. In this…
Nonignorable missing data, where the probability of missingness depends on unobserved values, presents a significant challenge in statistical analysis. Traditional methods often rely on strong parametric assumptions that are difficult to…
This paper reviews recent advances in missing data research using graphical models to represent multivariate dependencies. We first examine the limitations of traditional frameworks from three different perspectives: \textit{transparency,…
Nonignorable missing outcomes are common in real world datasets and often require strong parametric assumptions to achieve identification. These assumptions can be implausible or untestable, and so we may forgo them in favour of partially…
Latent feature models (LFM)s are widely employed for extracting latent structures of data. While offering high, parameter estimation is difficult with LFMs because of the combinational nature of latent features, and non-identifiability is a…
We consider computationally-efficient estimation of population parameters when observations are subject to missing data. In particular, we consider estimation under the realizable contamination model of missing data in which an $\epsilon$…
Mediation analysis is widely used for exploring treatment mechanisms; however, it faces challenges when nonignorable missing confounders are present. Efficient inference of mediation effects and the efficiency loss due to nonignorable…
Many statistical estimands of interest (e.g., in regression or causality) are functions of the joint distribution of multiple random variables. But in some applications, data is not available that measures all random variables on each…
We study semiparametric efficiency bounds and efficient estimation of parameters defined through general moment restrictions with missing data. Identification relies on auxiliary data containing information about the distribution of the…
Missing data can lead to inefficiencies and biases in analyses, in particular when data are missing not at random (MNAR). It is thus vital to understand and correctly identify the missing data mechanism. Recovering missing values through a…
Missing Not at Random (MNAR) and nonnormal data are challenging to handle. Traditional missing data analytical techniques such as full information maximum likelihood estimation (FIML) may fail with nonnormal data as they are built on normal…
We consider the problem of parameter estimation using weakly supervised datasets, where a training sample consists of the input and a partially specified annotation, which we refer to as the output. The missing information in the annotation…
During the past few decades, missing-data problems have been studied extensively, with a focus on the ignorable missing case, where the missing probability depends only on observable quantities. By contrast, research into non-ignorable…
Pre-trained machine learning (ML) predictions have been increasingly used to complement incomplete data to enable downstream scientific inquiries, but their naive integration risks biased inferences. Recently, multiple methods have been…