English
Related papers

Related papers: Alternative formulas for synthetic dual system est…

200 papers

The use of dual system estimation (DSE) is heavily used in Census Bureau operations. With DSE methods, it is important to implement methods to infer the population size among those with missing data from one or both data sources. The use of…

Computation · Statistics 2026-05-27 Zhiyuan Lu

Binary observations are often repeated to improve data quality, creating technical replicates. Several scoring methods are commonly used to infer the actual individual state and obtain a probability for each state. The common practice of…

Methodology · Statistics 2025-01-24 Manuela Royer-Carenzi , Hadrien Lorenzo , Pierre Pudlo

Measuring average differences in an outcome across racial or ethnic groups is a crucial first step for equity assessments, but researchers often lack access to data on individuals' races and ethnicities to calculate them. A common solution…

Methodology · Statistics 2024-03-12 Benjamin Lu , Jia Wan , Derek Ouyang , Jacob Goldin , Daniel E. Ho

Capture-recapture methods aim to estimate the size of a closed population on the basis of multiple incomplete enumerations of individuals. In many applications, the individual probability of being recorded is heterogeneous in the…

Methodology · Statistics 2016-06-08 James E. Johndrow , Kristian Lum , Daniel Manrique-Vallier

Despite of the great efforts during the censuses, occurrence of some nonsampling errors such as coverage error is inevitable. Coverage error which can be classified into two types of under-count and overcount occurs when there is no unique…

Applications · Statistics 2019-10-15 Sepideh Mosaferi

In official statistics, dual system estimation (DSE) is a well-known tool to estimate the size of a population. Two sources are linked, and the number of units that are missed by both sources is estimated. Often dual system estimation is…

Methodology · Statistics 2025-05-05 Ceejay Hammond , Paul A. Smith , Peter G. M. van der Heijden

When seeking to release public use files for confidential data, statistical agencies can generate fully synthetic data. We propose an approach for making fully synthetic data from surveys collected with complex sampling designs. Our…

Methodology · Statistics 2024-04-30 Shirley Mathur , Yajuan Si , Jerome P. Reiter

The US Decennial Census provides valuable data for both research and policy purposes. Census data are subject to a variety of disclosure avoidance techniques prior to release in order to preserve respondent confidentiality. While many are…

Computers and Society · Computer Science 2025-10-02 Cynthia Dwork , Kristjan Greenewald , Manish Raghavan

The problem of estimating the size of a population based on a subset of individuals observed across multiple data sources is often referred to as capture-recapture or multiple-systems estimation. This is fundamentally a missing data…

Methodology · Statistics 2022-06-22 Serge Aleshin-Guendel , Mauricio Sadinle , Jon Wakefield

Over the last century, the adoption of novel scientific methods for conducting the U.S. census has been met with wide-ranging receptions. Some methods were quietly embraced, while others sparked decades-long controversies. What accounts for…

Computers and Society · Computer Science 2026-02-24 Jayshree Sarathy , danah boyd

We propose dual regression as an alternative to the quantile regression process for the global estimation of conditional distribution functions under minimal assumptions. Dual regression provides all the interpretational power of the…

Methodology · Statistics 2018-09-26 Richard Spady , Sami Stouli

This paper considers the two-dataset problem, where data are collected from two potentially different populations sharing common aspects. This problem arises when data are collected by two different types of researchers or from two…

Methodology · Statistics 2022-09-27 Steven N. MacEachern , Koji Miyawaki

This paper proposes using a method named Double Score Matching (DSM) to do mass-imputation and presents an application to make inferences with a nonprobability sample. DSM is a $k$-Nearest Neighbors algorithm that uses two balance scores…

Methodology · Statistics 2021-10-19 Ali Furkan Kalay

In this paper, we provide a method to generate synthetic population at various administrative levels for a country like India. This synthetic population is created using machine learning and statistical methods applied to survey data such…

Computers and Society · Computer Science 2024-05-17 Bhavesh Neekhra , Kshitij Kapoor , Debayan Gupta

Multicellular systems play a key role in bioprocess and biomedical engineering. Cell ensembles encountered in these setups show phenotypic variability like size and biochemical composition. As this variability may result in undesired…

Systems and Control · Computer Science 2018-07-16 Armin Küper , Robert Dürr , Steffen Waldherr

Motivated by various applications, we consider the problem of homogeneous human population size (N) estimation from Dual-record system (DRS) (equivalently, two-sample capture-recapture experiment). The likelihood estimate from the…

Methodology · Statistics 2015-04-07 Kiranmoy Chatterjee , Diganta Mukherjee

High-resolution estimates of population health indicators are critical for precision public health. We propose a method for high-resolution estimation that fuses distinct data sources: an unbiased, low-resolution data source (e.g.…

Methodology · Statistics 2025-08-21 Amy Guan , Marissa Reitsma , Roshni Sahoo , Joshua Salomon , Stefan Wager

Statisticians have recently developed propensity score methods to improve generalizations from randomized experiments that do not employ random sampling. However, these methods typically rely on assumptions whose plausibility may be…

Methodology · Statistics 2019-11-14 Wendy Chan

This paper proposes a statistical framework of using artificial intelligence to improve human decision making. The performance of each human decision maker is benchmarked against that of machine predictions. We replace the diagnoses made by…

Econometrics · Economics 2024-12-10 Kai Feng , Han Hong , Ke Tang , Jingyuan Wang

For Dual-record system, in the context of human population, the popular Chandrasekar-Deming model incorporates only the time variation effect on capture probabilities. How-ever, in practice population may undergo behavioral change after…

Methodology · Statistics 2015-04-30 Kiranmoy Chatterjee , Diganta Mukherjee
‹ Prev 1 2 3 10 Next ›