Related papers: Zero Inflation as a Missing Data Problem: a Proxy-…
Many data sets cannot be accurately described by standard probability distributions due to the excess number of zero values present. For example, zero-inflation is prevalent in microbiome data and single-cell RNA sequencing data, which…
We consider the complex data modeling problem motivated by the zero-inflated and overdispersed data from microbiome studies. Analyzing how microbiome abundance is associated with human biological features, such as BMI, is of great…
The disaggregated time-series for the Consumer Price Index (CPI) often exhibits exact zero price changes, stemming from structural features of the data collection process. However, the currently prominent stochastic volatility model of…
Claim frequency data in insurance records the number of claims on insurance policies during a finite period of time. Given that insurance companies operate with multiple lines of insurance business where the claim frequencies on different…
Pattern-mixture models provide a transparent approach for handling missing data, where the full-data distribution is factorized in a way that explicitly shows the parts that can be estimated from observed data alone, and the parts that…
This research deals with the estimation and imputation of missing data in longitudinal models with a Poisson response variable inflated with zeros. A methodology is proposed that is based on the use of maximum likelihood, assuming that data…
Empirical researchers increasingly use upstream machine-learning (ML) methods to construct proxies for latent target variables from complex, unstructured data. A naive plug-in use of such proxies in downstream econometric models, however,…
Many clinical endpoint measures, such as the number of standard drinks consumed per week or the number of days that patients stayed in the hospital, are count data with excessive zeros. However, the zero-inflated nature of such outcomes is…
The zero-inflated logistic regression model accommodates binary responses with excess zeros, which often arise from a latent mixture of susceptible and insusceptible subpopulations or asymmetric misclassification of the response. The model…
Advancements in data collection techniques and the heterogeneity of data resources can yield high percentages of missing observations on variables, such as block-wise missing data. Under missing-data scenarios, traditional methods such as…
Given a system of analytic functions and an approximate zero, we introduce inflation to transform this system into one with a regular quadratic zero. This leads to a method for isolating a cluster of zeros of the given system.
Wearable devices collect time-varying biobehavioral data, offering opportunities to investigate how behaviors influence health outcomes. However, these data often contain measurement error and excess zeros (due to nonwear, sedentary…
Missing data arises when certain values are not recorded or observed for variables of interest. However, most of the statistical theory assume complete data availability. To address incomplete databases, one approach is to fill the gaps…
Missing data theory deals with the statistical methods in the occurrence of missing data. Missing data occurs when some values are not stored or observed for variables of interest. However, most of the statistical theory assumes that data…
Zero-inflated continuous data ubiquitously appear in many fields, in which lots of exactly zero-valued data are observed while others distribute continuously. Due to the mixed structure of discreteness and continuity in its distribution,…
Zero inflation is a common nuisance while monitoring disease progression over time. This article proposes a new observation driven model for zero inflated and over-dispersed count time series. The counts given the past history of the…
We consider the problem of estimating the mean of a random variable Y subject to non-ignorable missingness, i.e., where the missingness mechanism depends on Y . We connect the auxiliary proxy variable framework for non-ignorable missingness…
In many cases, a machine learning model must learn to correctly predict a few data points with particular values of interest in a broader range of data where many target values are zero. Zero-inflated data can be found in diverse scenarios,…
Economists are blessed with a wealth of data for analysis, but more often than not, values in some entries of the data matrix are missing. Various methods have been proposed to handle missing observations in a few variables. We exploit the…
It is often said that the fundamental problem of causal inference is a missing data problem -- the comparison of responses to two hypothetical treatment assignments is made difficult because for every experimental unit only one potential…