统计方法学
Conformal risk control is an extension of conformal prediction for controlling risk functions beyond miscoverage. The original algorithm controls the expected value of a loss that is monotonic in a one-dimensional parameter. Here, we…
Bonferroni's correction is a popular tool to address multiplicity but is notorious for its low power when tests are dependent. This paper proposes a practical modification of Bonferroni's correction when test statistics are jointly normal…
P-splines provide a flexible and computationally efficient smoothing framework and are commonly used for derivative estimation in functional data. Including an additive penalty term in P-splines has been shown to improve estimates of…
Modern data-driven applications increasingly rely on large, heterogeneous datasets collected across multiple sites. Differences in data availability, feature representation, and underlying populations often induce structured missingness,…
We study uplift estimation for combinatorial treatments. Uplift measures the pure incremental causal effect of an intervention (e.g., sending a coupon or a marketing message) on user behavior, modeled as a conditional individual treatment…
Bayes factors are widely computed by Monte Carlo, yet heavy-tailed sampling distributions can make numerical validation unreliable. The Turing--Good identities provide exact moment equalities for powers of a Bayes factor (a density ratio).…
Modern causal decision-making increasingly demands individualized treatment-effect estimation in networks where interventions are high-dimensional, combinatorial vectors. While network interference, effect heterogeneity, and…
Directional data arise in many applications where observations are naturally represented as unit vectors or as observations on the surface of a unit hypersphere. In this context, statistical depth functions provide a center--outward…
When the number of assets is larger than the sample size, the minimum variance portfolio interpolates the training data, delivering pathological zero in-sample variance. We show that if the weights of the zero variance portfolio are learned…
We extend the knockoffs method for selecting predictors to clustered data (cross-sectional or repeated measures). In the setting of clustered data, variable selection is complex because some predictors are measured at the observation level…
Treatment effect heterogeneity is central to policy evaluation, social science, and precision medicine, where interventions can affect individuals differently. In observational studies, covariates, treatment, and outcomes are often only…
Regression discontinuity and kink designs are typically analyzed through mean effects, even when treatment changes the shape of the entire outcome distribution. To address this, we introduce distributional discontinuity designs, a framework…
We propose a localized conformal model selection framework that integrates local adaptivity with post-selection validity for distribution-free prediction. By performing model selection symmetrically across calibration points using upper and…
Mixed-effects models are fundamental tools for analyzing clustered and repeated-measures data, but existing high-dimensional methods largely focus on penalized estimation with vector-valued covariates. Bayesian alternatives in this regime…
Matched case-control studies are commonly employed in epidemiological research for their convenience and efficiency. Analysis of secondary outcomes can yield valuable insights into biological pathways and help identify genetic variants of…
Aspect-Based Sentiment Analysis (ABSA) provides a fine-grained understanding of opinions by linking sentiment to specific aspects in text. While transformer-based models excel at this task, their black-box nature limits their…
Multilayer networks have become increasingly ubiquitous across diverse scientific fields, ranging from social sciences and biology to economics and international relations. Despite their broad applications, the inferential theory for…
Background: Phase I dose-finding trials increasingly encounter delayed-onset toxicities, especially with immunotherapies and targeted agents. The time-to-event continual reassessment method (TITE-CRM) handles incomplete follow-up using…
Recurrent binary outcomes within individuals, such as hospital readmissions, often reflect latent risk processes that evolve over time. Conventional methods like generalized linear mixed models and generalized estimating equations estimate…
To provide a comprehensive summary of the tail distribution, the expected shortfall is defined as the average over the tail above (or below) a certain quantile of the distribution. The expected shortfall regression captures the…