English
Related papers

Related papers: Efficient Estimation Under Data Fusion

200 papers

We introduce a new data fusion method that utilizes multiple data sources to estimate a smooth, finite-dimensional parameter. Most existing methods only make use of fully aligned data sources that share common conditional distributions of…

Methodology · Statistics 2025-04-30 Sijia Li , Peter B. Gilbert , Rui Duan , Alex Luedtke

We provide a novel characterization of semiparametric efficiency in a generic supervised learning setting where the outcome mean function -- defined as the conditional expectation of the outcome of interest given the other observed…

Methodology · Statistics 2025-04-22 Harrison H. Li

Data analysis based on information from several sources is common in economic and biomedical studies. This setting is often referred to as the data fusion problem, which differs from traditional missing data problems since no complete data…

Methodology · Statistics 2022-04-07 Wei Li , Shanshan Luo , Wangli Xu

We consider a general statistical estimation problem involving a finite-dimensional target parameter vector. Beyond an internal data set drawn from the population distribution, external information, such as additional individual data or…

Methodology · Statistics 2025-07-31 Guorong Dai , Lingxuan Shao , Jinbo Chen

Suppose one is interested in estimating causal effects in the presence of potentially unmeasured confounding with the aid of a valid instrumental variable. This paper investigates the problem of making inferences about the average treatment…

Methodology · Statistics 2020-12-15 BaoLuo Sun , Wang Miao

We address the goal of conducting inference about a smooth finite-dimensional parameter by utilizing individual-level data from various independent sources. Recent advancements have led to the development of a comprehensive theory capable…

Statistics Theory · Mathematics 2025-11-19 Ellen Graham , Marco Carone , Andrea Rotnitzky

High-resolution estimates of population health indicators are critical for precision public health. We propose a method for high-resolution estimation that fuses distinct data sources: an unbiased, low-resolution data source (e.g.…

Methodology · Statistics 2025-08-21 Amy Guan , Marissa Reitsma , Roshni Sahoo , Joshua Salomon , Stefan Wager

Suppose we have individual data from an internal study and various summary statistics from relevant external studies. External summary statistics have the potential to improve statistical inference for the internal population; however, it…

Methodology · Statistics 2026-02-06 Wenjie Hu , Ruoyu Wang , Wei Li , Wang Miao

Causal inference across multiple data sources offers a promising avenue to enhance the generalizability and replicability of scientific findings. However, data integration methods for time-to-event outcomes, common in biomedical research,…

Methodology · Statistics 2025-05-16 Yi Liu , Alexander W. Levis , Ke Zhu , Shu Yang , Peter B. Gilbert , Larry Han

We propose a semiparametric data fusion framework for efficient inference on survival probabilities by integrating right-censored and current status data. Existing data fusion methods focus largely on fusing right-censored data only, while…

Methodology · Statistics 2025-09-15 Xiudi Li , Sijia Li

Statistical estimation in many contemporary settings involves the acquisition, analysis, and aggregation of datasets from multiple sources, which can have significant differences in character and in value. Due to these variations, the…

Applications · Statistics 2014-12-23 Quentin Berthet , Venkat Chandrasekaran

Many statistical estimands of interest (e.g., in regression or causality) are functions of the joint distribution of multiple random variables. But in some applications, data is not available that measures all random variables on each…

Methodology · Statistics 2025-02-11 Yicong Jiang , Lucas Janson

This paper investigates the problem of making inference about a parametric model for the regression of an outcome variable $Y$ on covariates $(V,L)$ when data are fused from two separate sources, one which contains information only on $(V,…

Methodology · Statistics 2020-12-15 Katherine Evans , BaoLuo Sun , James Robins , Eric J. Tchetgen Tchetgen

For most problems in science and engineering we can obtain data sets that describe the observed system from various perspectives and record the behavior of its individual components. Heterogeneous data sets can be collectively mined by data…

Machine Learning · Computer Science 2015-02-09 Marinka Žitnik , Blaž Zupan

In this paper we propose an extension of the notion of deviation-based aggregation function tailored to aggregate multidimensional data. Our objective is both to improve the results obtained by other methods that try to select the best…

Motivated by image-on-scalar regression with data aggregated across multiple sites, we consider a setting in which multiple independent studies each collect multiple dependent vector outcomes, with potential mean model parameter homogeneity…

Methodology · Statistics 2022-10-06 Emily C. Hector

Statistical machine learning methods often face the challenge of limited data available from the population of interest. One remedy is to leverage data from auxiliary source populations, which share some conditional distributions or are…

Methodology · Statistics 2024-06-11 Hongxiang Qiu , Eric Tchetgen Tchetgen , Edgar Dobriban

In the era of big data, the increasing availability of diverse data sources has driven interest in analytical approaches that integrate information across sources to enhance statistical accuracy, efficiency, and scientific insights. Many…

Methodology · Statistics 2026-03-30 Lu Wang , Yanyuan Ma , Jiwei Zhao

In the era of big data, the explosive growth of multi-source heterogeneous data offers many exciting challenges and opportunities for improving the inference of conditional average treatment effects. In this paper, we investigate…

Machine Learning · Statistics 2022-11-02 Xinyu Li , Yilin Li , Qing Cui , Longfei Li , Jun Zhou

We propose a distributed quadratic inference function framework to jointly estimate regression parameters from multiple potentially heterogeneous data sources with correlated vector outcomes. The primary goal of this joint integrative…

Methodology · Statistics 2022-07-28 Emily C. Hector , Peter X. -K. Song
‹ Prev 1 2 3 10 Next ›