统计方法学
Causal mediation analysis has been extended to estimate path-specific effects with multiple intermediate variables, isolating treatment effects through a mediator of interest while excluding pathways through its ancestors. Such analyses…
Interactive assessments generate sequential process data that are not well handled by conventional item response models. Existing MDP-based measurement approaches, such as the Markov decision process measurement model (MDP-MM, LaMar, 2018),…
Identifying covariates that modify treatment effects is a central problem in causal inference. Yet existing data-adaptive procedures do not provide finite-sample control over the expected number of false discoveries, risking spurious…
We study target-population quantile treatment effects when a source study may have unmeasured treatment confounding and may not transport to a target population after conditioning on observed covariates. The observed data consist of a…
This paper develops semiparametric theory for counterfactual distribution, quantile, and lower-tail risk processes under unmeasured confounding using proximal negative-control proxies. Rather than treating each threshold as a separate…
Randomization tests and flexible treatment-effect models offer complementary strengths for analyzing data from randomized panel experiments: the former provide valid inference under the known assignment mechanism, while the latter can…
Clustering high-dimensional data is especially challenging when cluster distributions are heavy tailed and only approximately elliptical. Existing high-dimensional methods are largely built for Gaussian or other light-tailed models, whereas…
Unlabeled data are increasingly prevalent in contemporary economic studies, yet their effective use for improving prediction remains challenging because the outcomes are often costly or even infeasible to observe. Machine learning methods…
We consider a broad class of semiparametric regression models in which the conditional distribution of the response takes the form $f\{Y|\bf{x}^{\rm T}\boldsymbol{\beta}+m(z), \phi\}$, which is known up to a parametric component…
Multi-institutional electronic health record (Multi-EHR) data have emerged as a powerful resource for developing predictive models to support clinical decisions and for generating reliable real-world evidence. By aggregating information…
Human activity spaces are shaped by individual mobility and the built environment, motivating statistical methods that integrate GPS observations with GIS representations of places and routes. We propose a novel methodology to estimate…
We propose and analyse rolling-origin conformal prediction for time-series forecasting. The method calibrates the conformal quantile against the $m$ most recent pseudo-out-of-sample forecast errors, adapting to serial dependence, volatility…
Pragmatic trials increasingly define outcomes using real-world data such as electronic health records, where assessments are collected during routine care rather than at fixed timepoints. Consequently, these uncontrolled assessments may be…
We propose a joint individualized hurdle-ordinal regression model for paired zero-inflated ordinal outcomes with subject-specific, spatially varying, and time-varying covariate effects, motivated by the Iowa Fluoride Study (IFS). The two…
Modern heterogeneity-robust difference-in-differences estimators derive their asymptotic properties under iid, cluster, or fixed-design frameworks that abstract from complex survey sampling, yet practitioners routinely apply them to…
Reliable inference for spatial regression remains challenging because it requires the correct specification of the spatial dependence structure, the mean trend, and the error distribution. Existing parametric testing methods rely on…
Evidence accumulation models (EAMs) provide a powerful framework for inferring latent cognitive processes from choice and reaction time data. While EAMs are traditionally limited to binary choices, recent developments have extended them to…
Synthetic tabular data are often evaluated by distributional similarity, privacy distance, or train-on-synthetic-test-on-real predictive performance, but these criteria do not ensure validity for causal inference. We show that fully…
Subnational monitoring of public health often relies on household surveys where data are sparse at the desired spatial resolution. Small area estimation (SAE) methods address this challenge by borrowing strength across areas and…
This paper studies a structural failure of subsample-based estimation in dynamic time series models. Even under oracle knowledge of contamination locations, removing contaminated observations does not restore the uncontaminated objective.…