Related papers: Survey Data Integration for Distribution Function …
Finite population inference is a central goal in survey sampling. Probability sampling is the main statistical approach to finite population inference. Challenges arise due to high cost and increasing non-response rates. Data integration…
In the age of big data, nonprobability surveys are becoming increasingly abundant. Data integration techniques involving both probability and nonprobability surveys are being extensively used for providing improved estimates for finite…
In survey analysis, the estimation of the cumulative distribution function (cdf) is of great interest: it allows for instance to derive quantiles estimators or other non linear parameters derived from the cdf. We consider the case where the…
A reduced-bias nonparametric estimator of the cumulative distribution function (CDF) and the survival function is proposed using infinite-order kernels. Fourier transform theory on generalized functions is utilized to obtain the improved…
The cumulative distribution function (CDF) is fundamental for characterizing random variables, making it essential in applications that require privacy-preserving data analysis. This paper introduces a novel framework for constructing…
Multiple data sources are becoming increasingly available for statistical analyses in the era of big data. As an important example in finite-population inference, we consider an imputation approach to combining a probability sample with big…
Forward simulation-based uncertainty quantification that studies the distribution of quantities of interest (QoI) is a crucial component for computationally robust engineering design and prediction. There is a large body of literature…
The estimation of cumulative distribution functions (CDF) is an important learning task with a great variety of downstream applications, such as risk assessments in predictions and decision making. In this paper, we study functional…
We study nonparametric estimation of univariate cumulative distribution functions (CDFs) pertaining to data missing at random. The proposed estimators smooth the inverse probability weighted (IPW) empirical CDF with the Bernstein operator,…
Normalizing flows model a complex target distribution in terms of a bijective transform operating on a simple base distribution. As such, they enable tractable computation of a number of important statistical quantities, particularly…
The use of big data in official statistics and the applied sciences is accelerating, but statistics computed using only big data often suffer from substantial selection bias. This leads to inaccurate estimation and invalid statistical…
Learning the multivariate distribution of data is a core challenge in statistics and machine learning. Traditional methods aim for the probability density function (PDF) and are limited by the curse of dimensionality. Modern neural methods…
This paper considers the problem of estimating the cumulative distribution function and probability density function of a random variable using data quantized by uniform and non-uniform quantizers. A simple estimator is proposed based on…
In this paper we study predictive mean matching mass imputation estimators to integrate data from probability and non-probability samples. We consider two approaches: matching predicted to predicted ($\hat{y}-\hat{y}$~matching; PMM A) and…
A quantile is defined as a value below which random draws from a given distribution falls with a given probability. In a centralized setting where the cumulative distribution function (CDF) is unknown, the empirical CDF (ECDF) can be used…
The statistical challenges in using big data for making valid statistical inference in the finite population have been well documented in literature. These challenges are due primarily to statistical bias arising from under-coverage in the…
We propose a method for finding a cumulative distribution function (cdf) that minimizes the distance to a given cdf, while belonging to an ambiguity set constructed relative to another cdf and, possibly, incorporating soft information. Our…
We consider the problem of evaluating the cumulative distribution function (CDF) of the sum of order statistics, which serves to compute outage probability (OP) values at the output of generalized selection combining receivers. Generally,…
Doubly robust estimators combine an inverse probability weighting estimator and a mass imputation estimator. Several doubly robust estimators for estimating the population mean (or prevalence) of an outcome have been proposed for…
The aim of survey statistics is to produce estimates with a minimal bias and a corresponding acceptable variance given a specific budget, preferable with a minor response burden for the participants. In recent years, considerable efforts…