统计方法学
It has been frequently observed that Neyman orthogonality, the central device underlying double/debiased machine learning (Chernozhukov et al., 2018), and pathwise differentiability, a cornerstone concept from semiparametric theory, often…
Standard practice in electronic health record (EHR)-based studies evaluating the comparative effectiveness of bariatric surgery relative to no surgery is to estimate and report a constant treatment effect across calendar time. However,…
Bayesian hypothesis testing via Bayes factors offers a principled alternative to classical p-value methods in meta-analysis, particularly suited to its cumulative and sequential nature. Unlike commonly reported p-values for standard null…
Modern biomedical studies frequently collect complex, high-dimensional physiological signals using wearables and sensors along with time-to-event outcomes, making efficient variable selection methods crucial for interpretation and improving…
Evaluation of clinical prediction models across multiple clusters, whether centers or datasets, is becoming increasingly common. A comprehensive evaluation includes an assessment of the agreement between the estimated risks and the observed…
We make the case for incorporating a notion of time into causal directed acyclic graphs (DAGs). We demonstrate that nontemporal causal DAGs are ambiguous and obstruct justification of the acyclicity assumption. Assuming that causes precede…
In many research fields, researchers aim to identify significant associations between a set of explanatory variables and a response while controlling the FDR. The Knockoff filter has been recently proposed in the frequentist paradigm to…
Traditional statistical methods need to be updated to work with modern distributed data storage paradigms. A common approach is the split-and-conquer framework, which involves learning models on local machines and averaging their parameter…
Mediation analysis is widely used for exploring treatment mechanisms; however, it faces challenges when nonignorable missing confounders are present. Efficient inference of mediation effects and the efficiency loss due to nonignorable…
Time series of matrix-valued data are increasingly available in various areas including economics, finance, social science, among others. These data may shed light on the inter-dynamical relationships between two sets of attributes, for…
Determining the number of factors in high-dimensional factor models remains a fundamental challenge, particularly when data are incomplete. This paper introduces the concept of identifiable factors, those that can be reliably recovered…
We propose an empirical Bayes framework for aggregating estimators obtained from several identification functionals associated to the same causal parameter. The central object is a posterior mean that pools a collection of asymptotically…
We discuss the regression-by-composition framework of Farewell, Daniel, Stensrud and Huitfeldt, highlighting a key consequence of its sequential construction: order dependence. Reordering the flows may change the implied conditional…
In an editorial in the Journal of Marketing, Steenkamp et al. (2026) make a valuable and timely intervention by urging marketing scholars to move beyond dichotomous significance testing and to report effect sizes that speak to substantive…
Win statistics have become increasingly popular for analyzing hierarchical composite endpoints in clinical trials, because they summarize treatment benefit through pairwise comparisons that respect the clinical importance order among…
Stepped-wedge cluster randomized trials (SW-CRTs) evaluate interventions rolled out across clusters over time. Standard analyses typically use immediate-treatment (IT) models, which assume effects begin at crossover and remain constant…
Observational data are often used to answer causal questions, yet the legitimacy of doing so is often argued to hinge on strong, domain supported assumptions about underlying causal structure with limited guidance on how much domain…
Smoothness has long been the dominant form of parsimony in functional data analysis, to the point of occasionally being conflated with the very notion of functional data. However, many core inferential tasks depend on the inverse…
Joint models for longitudinal and time-to-event data are increasingly used in health research to characterize the association between biomarker trajectories and the risk of clinical events. However, these models usually assume a linear…
High-resolution simulation models are essential for representing complex physical systems, yet their substantial computational cost severely limits the number of feasible high-fidelity (HF) evaluations. This problem is often addressed…