Related papers: Collaborative causal inference with a distributed …

Privacy enhanced collaborative inference in the Cox proportional hazards model for distributed data

Data sharing barriers are paramount challenges arising from multicenter clinical studies where multiple data sources are stored in a distributed fashion at different local study sites. Particularly in the case of time-to-event analysis when…

Applications · Statistics 2024-09-10 Mengtong Hu , Xu Shi , Peter X. -K. Song

Data Integration in Causal Inference

Integrating data from multiple heterogeneous sources has become increasingly popular to achieve a large sample size and diverse study population. This paper reviews development in causal inference methods that combines multiple datasets…

Methodology · Statistics 2021-10-05 Xu Shi , Ziyang Pan , Wang Miao

Collaborative Heterogeneous Causal Inference Beyond Meta-analysis

Collaboration between different data centers is often challenged by heterogeneity across sites. To account for the heterogeneity, the state-of-the-art method is to re-weight the covariate distributions in each site to match the distribution…

Machine Learning · Statistics 2024-04-25 Tianyu Guo , Sai Praneeth Karimireddy , Michael I. Jordan

A Communication-Efficient Distributed Algorithm for Learning with Heterogeneous and Structurally Incomplete Multi-Site Data

In multicenter biomedical research, integrating data from multiple decentralized sites provides more robust and generalizable findings due to its larger sample size and the ability to account for the between-site heterogeneity. However,…

Methodology · Statistics 2025-12-29 Xiaokang Liu , Yuchen Yang , Yifei Sun , Jiang Bian , Yanyuan Ma , Raymond J. Carroll , Yong Chen

Mining Combined Causes in Large Data Sets

In recent years, many methods have been developed for detecting causal relationships in observational data. Some of them have the potential to tackle large data sets. However, these methods fail to discover a combined cause, i.e. a…

Artificial Intelligence · Computer Science 2015-10-16 Saisai Ma , Jiuyong Li , Lin Liu , Thuc Duy Le

Targeted Data Fusion for Causal Survival Analysis Under Distribution Shift

Causal inference across multiple data sources offers a promising avenue to enhance the generalizability and replicability of scientific findings. However, data integration methods for time-to-event outcomes, common in biomedical research,…

Methodology · Statistics 2025-05-16 Yi Liu , Alexander W. Levis , Ke Zhu , Shu Yang , Peter B. Gilbert , Larry Han

Federated Causal Discovery From Interventions

Causal discovery serves a pivotal role in mitigating model uncertainty through recovering the underlying causal mechanisms among variables. In many practical domains, such as healthcare, access to the data gathered by individual entities is…

Machine Learning · Computer Science 2024-02-13 Amin Abyaneh , Nino Scherrer , Patrick Schwab , Stefan Bauer , Bernhard Schölkopf , Arash Mehrjou

Identification and Estimation of Causal Effects from Dependent Data

The assumption that data samples are independent and identically distributed (iid) is standard in many areas of statistics and machine learning. Nevertheless, in some settings, such as social networks, infectious disease modeling, and…

Methodology · Statistics 2019-02-06 Eli Sherman , Ilya Shpitser

Federated Estimation of Causal Effects from Observational Data

Many modern applications collect data that comes in federated spirit, with data kept locally and undisclosed. Till date, most insight into the causal inference requires data to be stored in a central repository. We present a novel framework…

Methodology · Statistics 2021-06-02 Thanh Vinh Vo , Trong Nghia Hoang , Young Lee , Tze-Yun Leong

Federated Causal Inference from Observational Data

Decentralized data sources are prevalent in real-world applications, posing a formidable challenge for causal inference. These sources cannot be consolidated into a single entity owing to privacy constraints. The presence of dissimilar data…

Machine Learning · Computer Science 2024-05-31 Thanh Vinh Vo , Young lee , Tze-Yun Leong

Distributed Simultaneous Inference in Generalized Linear Models via Confidence Distribution

We propose a distributed method for simultaneous inference for datasets with sample size much larger than the number of covariates, i.e., N >> p, in the generalized linear models framework. When such datasets are too big to be analyzed…

Methodology · Statistics 2020-07-23 Lu Tang , Ling Zhou , Peter X. -K. Song

Causal aggregation: estimation and inference of causal effects by constraint-based data fusion

In causal inference, it is common to estimate the causal effect of a single treatment variable on an outcome. However, practitioners may also be interested in the effect of simultaneous interventions on multiple covariates of a fixed target…

Methodology · Statistics 2022-11-24 Jaime Roquero Gimenez , Dominik Rothenhäusler

A Data-Driven Two-Phase Multi-Split Causal Ensemble Model for Time Series

Causal inference is a fundamental research topic for discovering the cause-effect relationships in many disciplines. However, not all algorithms are equally well-suited for a given dataset. For instance, some approaches may only be able to…

Machine Learning · Computer Science 2024-03-11 Zhipeng Ma , Marco Kemmerling , Daniel Buschmann , Chrismarie Enslin , Daniel Lütticke , Robert H. Schmitt

A selective review on calibration information from similar studies based on parametric likelihood or empirical likelihood

In multi-center clinical trials, due to various reasons, the individual-level data are strictly restricted to be assessed publicly. Instead, the summarized information is widely available from published results. With the advance of…

Methodology · Statistics 2021-01-05 Jing Qin , Yukun Liu , Pengfei Li

Collaborative causal inference on distributed data

In recent years, the development of technologies for causal inference with privacy preservation of distributed data has gained considerable attention. Many existing methods for distributed data focus on resolving the lack of subjects…

Methodology · Statistics 2024-01-18 Yuji Kawamata , Ryoki Motai , Yukihiko Okada , Akira Imakura , Tetsuya Sakurai

Debiased Collaborative Filtering with Kernel-Based Causal Balancing

Debiased collaborative filtering aims to learn an unbiased prediction model by removing different biases in observational datasets. To solve this problem, one of the simple and effective methods is based on the propensity score, which…

Information Retrieval · Computer Science 2024-05-01 Haoxuan Li , Chunyuan Zheng , Yanghao Xiao , Peng Wu , Zhi Geng , Xu Chen , Peng Cui

Data collaboration for causal inference from limited medical testing and medication data

Observational studies enable causal inferences when randomized controlled trials (RCTs) are not feasible. However, integrating sensitive medical data across multiple institutions introduces significant privacy challenges. The data…

Methodology · Statistics 2025-03-24 Tomoru Nakayama , Yuji Kawamata , Akihiro Toyoda , Akira Imakura , Rina Kagawa , Masaru Sanuki , Ryoya Tsunoda , Kunihiro Yamagata , Tetsuya Sakurai , Yukihiko Okada

Data collaboration analysis for distributed datasets

In this paper, we propose a data collaboration analysis method for distributed datasets. The proposed method is a centralized machine learning while training datasets and models remain distributed over some institutions. Recently, data…

Machine Learning · Computer Science 2019-02-21 Akira Imakura , Tetsuya Sakurai

Calibrated regression estimation using empirical likelihood under data fusion

Data analysis based on information from several sources is common in economic and biomedical studies. This setting is often referred to as the data fusion problem, which differs from traditional missing data problems since no complete data…

Methodology · Statistics 2022-04-07 Wei Li , Shanshan Luo , Wangli Xu

Federated Causal Inference in Heterogeneous Observational Data

We are interested in estimating the effect of a treatment applied to individuals at multiple sites, where data is stored locally for each site. Due to privacy constraints, individual-level data cannot be shared across sites; the sites may…

Machine Learning · Computer Science 2023-04-04 Ruoxuan Xiong , Allison Koenecke , Michael Powell , Zhu Shen , Joshua T. Vogelstein , Susan Athey