应用统计
Modern weather stations in Germany record daily temperatures every 10 minutes, whereas measurements from historical reference periods are often only available at much coarser temporal resolutions, typically hourly. This discrepancy must be…
Understanding how decision makers balance operational efficiency with environmental and ecological risks is central to vessel navigation. We model vessel speed as a control variable in a constrained optimization framework in which vessel…
Metabolic syndrome is a complex clinical condition characterized by the simultaneous presence of multiple metabolic risk factors and represents a major public health concern. The syndrome develops silently and may remain undiagnosed for…
We propose KO-PDE-IDENT, a data-driven framework for identifying parsimonious partial differential equations (PDEs) with false discovery rate (FDR) control. PDE discovery from noisy observations is often hindered by extreme…
Small-area precipitation forecasts support real-time decisions for reservoir operation, irrigation planning, drought monitoring, and flash-flood response. Operational value depends not only on point accuracy, but also on calibrated…
Climate change is expected to significantly affect the physical, financial, and economic environments over the long term, posing risks to the financial health of general insurers. While general insurers typically use Dynamic Financial…
In multi-modal biomedical research, integrating high-dimensional genomic data with clinical baselines is essential for precision medicine. However, standard deep neural network approaches often entangle these modalities, obscuring the…
We analyze a fixed panel of S\&P 500 stocks from 1996 to 2026 using complementary static and kinetic Ising models applied to daily binary open-to-close movements. The static pairwise model provides a long-run maximum-entropy summary of…
This article is the rejoinder to ``The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review,'' to appear in the Journal of the American Statistical Association with discussion. To address the practical and…
Integrating multimodal datasets in clinical oncology is frequently hindered by high dimensionality and blockwise missingness, where entire data sources are unavailable for specific patient subsets. Standard survival models often struggle…
Model-assisted interval designs such as the Keyboard design are transparent and easy to implement in phase I oncology trials. However, interim decisions based solely on data from the current dose may overlook informative signals from…
We analyze downstream courtroom governance in Philadelphia eviction cases using 755,004 Municipal Court landlord--tenant records filed from 1969 through 2022. Post-filing case processing is organized by repeated courtroom relationships,…
Introduction: Logistic regression (LR)-type model limitations for causal inference are explained theoretically and empirically through the lens of the purported gateway effect from e-cigarette use to smoking. Previous studies have reported…
Real-World Data (RWD), with its large sample sizes and rich clinical detail, offers a compelling alternative to randomized controlled trials (RCTs) for studying treatment effects in diverse and complex patient populations. However, its…
This study presents the development and application of a scalable non-ergodic ground motion model (NGMM) for the Los Angeles area. The NGMM is trained and validated on physics-based simulated ground-motion data from a recent Statewide…
Deploying clinical prediction models across healthcare systems often fails when key training covariates are unavailable at deployment and labeled outcomes are limited in the target domain. For example, high-performing models for…
Heritability is a central concept in the long-standing debate about nature versus nurture in biological and social sciences. However, existing notions of heritability are based on strong assumptions and do not use explicit causal models. We…
Regularized Adjusted Plus-Minus (RAPM) is the standard framework for estimating individual player impact in basketball. Its application requires possession-level stint data -- records of which five players shared the court for each…
Previous comparisons of ordinary least squares with Newey-West standard errors (OLS-NW) and Prais-Winsten (PW) regression in multiple-group interrupted time series analysis have been limited to first-order autoregressive (AR[1]) errors…
AI agents increasingly execute procedural workflows as sequential action traces, which obscures latent concurrency and induces repeated step-by-step reasoning. We introduce BPOP, a Bayesianframework that infers a latent dependency partial…