Related papers: Alternative formulas for synthetic dual system est…
The use of dual system estimation (DSE) is heavily used in Census Bureau operations. With DSE methods, it is important to implement methods to infer the population size among those with missing data from one or both data sources. The use of…
Binary observations are often repeated to improve data quality, creating technical replicates. Several scoring methods are commonly used to infer the actual individual state and obtain a probability for each state. The common practice of…
Measuring average differences in an outcome across racial or ethnic groups is a crucial first step for equity assessments, but researchers often lack access to data on individuals' races and ethnicities to calculate them. A common solution…
Capture-recapture methods aim to estimate the size of a closed population on the basis of multiple incomplete enumerations of individuals. In many applications, the individual probability of being recorded is heterogeneous in the…
Despite of the great efforts during the censuses, occurrence of some nonsampling errors such as coverage error is inevitable. Coverage error which can be classified into two types of under-count and overcount occurs when there is no unique…
In official statistics, dual system estimation (DSE) is a well-known tool to estimate the size of a population. Two sources are linked, and the number of units that are missed by both sources is estimated. Often dual system estimation is…
When seeking to release public use files for confidential data, statistical agencies can generate fully synthetic data. We propose an approach for making fully synthetic data from surveys collected with complex sampling designs. Our…
The US Decennial Census provides valuable data for both research and policy purposes. Census data are subject to a variety of disclosure avoidance techniques prior to release in order to preserve respondent confidentiality. While many are…
The problem of estimating the size of a population based on a subset of individuals observed across multiple data sources is often referred to as capture-recapture or multiple-systems estimation. This is fundamentally a missing data…
Over the last century, the adoption of novel scientific methods for conducting the U.S. census has been met with wide-ranging receptions. Some methods were quietly embraced, while others sparked decades-long controversies. What accounts for…
We propose dual regression as an alternative to the quantile regression process for the global estimation of conditional distribution functions under minimal assumptions. Dual regression provides all the interpretational power of the…
This paper considers the two-dataset problem, where data are collected from two potentially different populations sharing common aspects. This problem arises when data are collected by two different types of researchers or from two…
This paper proposes using a method named Double Score Matching (DSM) to do mass-imputation and presents an application to make inferences with a nonprobability sample. DSM is a $k$-Nearest Neighbors algorithm that uses two balance scores…
In this paper, we provide a method to generate synthetic population at various administrative levels for a country like India. This synthetic population is created using machine learning and statistical methods applied to survey data such…
Multicellular systems play a key role in bioprocess and biomedical engineering. Cell ensembles encountered in these setups show phenotypic variability like size and biochemical composition. As this variability may result in undesired…
Motivated by various applications, we consider the problem of homogeneous human population size (N) estimation from Dual-record system (DRS) (equivalently, two-sample capture-recapture experiment). The likelihood estimate from the…
High-resolution estimates of population health indicators are critical for precision public health. We propose a method for high-resolution estimation that fuses distinct data sources: an unbiased, low-resolution data source (e.g.…
Statisticians have recently developed propensity score methods to improve generalizations from randomized experiments that do not employ random sampling. However, these methods typically rely on assumptions whose plausibility may be…
This paper proposes a statistical framework of using artificial intelligence to improve human decision making. The performance of each human decision maker is benchmarked against that of machine predictions. We replace the diagnoses made by…
For Dual-record system, in the context of human population, the popular Chandrasekar-Deming model incorporates only the time variation effect on capture probabilities. How-ever, in practice population may undergo behavioral change after…