Related papers: Alternative formulas for synthetic dual system est…

Log-linear Model for Dual System Estimation and Computational Considerations

The use of dual system estimation (DSE) is heavily used in Census Bureau operations. With DSE methods, it is important to implement methods to infer the population size among those with missing data from one or both data sources. The use of…

Computation · Statistics 2026-05-27 Zhiyuan Lu

Reconciling Binary Replicates: Beyond the Average

Binary observations are often repeated to improve data quality, creating technical replicates. Several scoring methods are commonly used to infer the actual individual state and obtain a probability for each state. The common practice of…

Methodology · Statistics 2025-01-24 Manuela Royer-Carenzi , Hadrien Lorenzo , Pierre Pudlo

Quantifying the Uncertainty of Imputed Demographic Disparity Estimates: The Dual-Bootstrap

Measuring average differences in an outcome across racial or ethnic groups is a crucial first step for equity assessments, but researchers often lack access to data on individuals' races and ethnicities to calculate them. A common solution…

Methodology · Statistics 2024-03-12 Benjamin Lu , Jia Wan , Derek Ouyang , Jacob Goldin , Daniel E. Ho

Estimating the observable population size from biased samples: a new approach to population estimation with capture heterogeneity

Capture-recapture methods aim to estimate the size of a closed population on the basis of multiple incomplete enumerations of individuals. In many applications, the individual probability of being recorded is heterogeneous in the…

Methodology · Statistics 2016-06-08 James E. Johndrow , Kristian Lum , Daniel Manrique-Vallier

Spatio-Temporal Mixed Models to Predict Coverage Error Rates at Local Areas

Despite of the great efforts during the censuses, occurrence of some nonsampling errors such as coverage error is inevitable. Coverage error which can be classified into two types of under-count and overcount occurs when there is no unique…

Applications · Statistics 2019-10-15 Sepideh Mosaferi

Dual system estimation using mixed effects loglinear models

In official statistics, dual system estimation (DSE) is a well-known tool to estimate the size of a population. Two sources are linked, and the number of units that are missed by both sources is estimated. Often dual system estimation is…

Methodology · Statistics 2025-05-05 Ceejay Hammond , Paul A. Smith , Peter G. M. van der Heijden

Fully Synthetic Data for Complex Surveys

When seeking to release public use files for confidential data, statistical agencies can generate fully synthetic data. We propose an approach for making fully synthetic data from surveys collected with complex sampling designs. Our…

Methodology · Statistics 2024-04-30 Shirley Mathur , Yajuan Si , Jerome P. Reiter

Synthetic Census Data Generation via Multidimensional Multiset Sum

The US Decennial Census provides valuable data for both research and policy purposes. Census data are subject to a variety of disclosure avoidance techniques prior to release in order to preserve respondent confidentiality. While many are…

Computers and Society · Computer Science 2025-10-02 Cynthia Dwork , Kristjan Greenewald , Manish Raghavan

The Central Role of the Identifying Assumption in Population Size Estimation

The problem of estimating the size of a population based on a subset of individuals observed across multiple data sources is often referred to as capture-recapture or multiple-systems estimation. This is fundamentally a missing data…

Methodology · Statistics 2022-06-22 Serge Aleshin-Guendel , Mauricio Sadinle , Jon Wakefield

Statistical Imaginaries, State Legitimacy: Grappling with the Arrangements Underpinning Quantification in the U.S. Census

Over the last century, the adoption of novel scientific methods for conducting the U.S. census has been met with wide-ranging receptions. Some methods were quietly embraced, while others sparked decades-long controversies. What accounts for…

Computers and Society · Computer Science 2026-02-24 Jayshree Sarathy , danah boyd

Dual Regression

We propose dual regression as an alternative to the quantile regression process for the global estimation of conditional distribution functions under minimal assumptions. Dual regression provides all the interpretational power of the…

Methodology · Statistics 2018-09-26 Richard Spady , Sami Stouli

A regression approach to the two-dataset problem

This paper considers the two-dataset problem, where data are collected from two potentially different populations sharing common aspects. This problem arises when data are collected by two different types of researchers or from two…

Methodology · Statistics 2022-09-27 Steven N. MacEachern , Koji Miyawaki

Double Robust Mass-Imputation with Matching Estimators

This paper proposes using a method named Double Score Matching (DSM) to do mass-imputation and presents an application to make inferences with a nonprobability sample. DSM is a $k$-Nearest Neighbors algorithm that uses two balance scores…

Methodology · Statistics 2021-10-19 Ali Furkan Kalay

Generating Synthetic Population

In this paper, we provide a method to generate synthetic population at various administrative levels for a country like India. This synthetic population is created using machine learning and statistical methods applied to survey data such…

Computers and Society · Computer Science 2024-05-17 Bhavesh Neekhra , Kshitij Kapoor , Debayan Gupta

Dynamic Density Estimation in Heterogeneous Cell Populations

Multicellular systems play a key role in bioprocess and biomedical engineering. Cell ensembles encountered in these setups show phenotypic variability like size and biochemical composition. As this variability may result in undesired…

Systems and Control · Computer Science 2018-07-16 Armin Küper , Robert Dürr , Steffen Waldherr

On the Population Size Estimation from Dual-record System: Profile-Likelihood Approaches

Motivated by various applications, we consider the problem of homogeneous human population size (N) estimation from Dual-record system (DRS) (equivalently, two-sample capture-recapture experiment). The likelihood estimate from the…

Methodology · Statistics 2015-04-07 Kiranmoy Chatterjee , Diganta Mukherjee

Data Fusion for High-Resolution Estimation

High-resolution estimates of population health indicators are critical for precision public health. We propose a method for high-resolution estimation that fuses distinct data sources: an unbiased, low-resolution data source (e.g.…

Methodology · Statistics 2025-08-21 Amy Guan , Marissa Reitsma , Roshni Sahoo , Joshua Salomon , Stefan Wager

An Evaluation of Bounding Approaches for Generalization

Statisticians have recently developed propensity score methods to improve generalizations from randomized experiments that do not employ random sampling. However, these methods typically rely on assumptions whose plausibility may be…

Methodology · Statistics 2019-11-14 Wendy Chan

Statistical Tests for Replacing Human Decision Makers with Algorithms

This paper proposes a statistical framework of using artificial intelligence to improve human decision making. The performance of each human decision maker is benchmarked against that of machine predictions. We replace the diagnoses made by…

Econometrics · Economics 2024-12-10 Kai Feng , Han Hong , Ke Tang , Jingyuan Wang

Approximate Bayesian Solution for Estimating Population Size from Dual-record System

For Dual-record system, in the context of human population, the popular Chandrasekar-Deming model incorporates only the time variation effect on capture probabilities. How-ever, in practice population may undergo behavioral change after…

Methodology · Statistics 2015-04-30 Kiranmoy Chatterjee , Diganta Mukherjee