应用统计
Among 755,004 Philadelphia landlord--tenant records filed during 1969-2022, 396,163 residential cases involve tenants who appear exactly once in the observed docket. In unadjusted comparisons, single-appearance cases handled by high-volume…
We develop Wasserstein-based hypothesis tests for empirical-measure convergence in stationary dependent sequences. For a known candidate invariant measure, $\mu$, we study the statistic $T_n=\sqrt{n}\,W_1(\hat\mu_n,\mu)$ and establish…
The fate of cities under natural hazards depends not only on hazard intensity but also on the coupling of structural damage, a collective process that remains poorly understood. Here we show that urban structural damage exhibits…
Accurate forecasts of weekly mortality are essential for public health and the insurance industry. We develop a forecasting framework that extends the Lee-Carter model with age- and region-specific seasonal effects and penalized distributed…
A popular quantitative approach to evaluating player performance in sports involves comparing an observed outcome to the expected outcome ignoring player involvement, which is estimated using statistical or machine learning methods. In…
Open burning of plastic waste may pose a significant threat to global health by degrading air quality, but quantitative research on this problem -- crucial for policy making -- has been stunted by lack of data. Many low- and middle-income…
This research note investigates the impact of the experience museum Sensoria, opened in September 2024 in Holzminden, Germany, on local tourism demand and related direct and indirect effects. To this end, the study employs a novel approach…
Bitcoin's price has been described as following a power law (PL) in time, $P \sim t^{\beta}$ with $\hat\beta \approx 5.7$ over 2010-2026. We test this claim using the Clauset-Shalizi-Newman protocol applied to Bitcoin's tail-relevant…
Accurately detecting home locations from GPS data generated by mobile devices is a foundational step in human mobility research, with significant implications for transportation planning, public health, and emergency response. However,…
Infrastructure deterioration poses significant challenges for asset management, yet existing approaches rely on population-averaged models that overlook equipment-specific heterogeneity. We present a novel framework that combines Bayesian…
We analyze the filing-side legal infrastructure of eviction using 755,004 Philadelphia Municipal Court landlord-tenant records filed between 1969 and 2022, of which 747,125 are residential. Eviction in Philadelphia is organized upstream by…
Net benefit is widely used and reported to evaluate the clinical utility of prediction models, yet its interpretation often remains difficult in practice. In this didactical note, we develop two complementary interpretations that make net…
OBJECTIVE: To propose time-to-event estimators that help evaluate incident diagnostic coding and possible upcoding in Medicare as well as introduce an open-source software package that enables more reproducible methods development relevant…
This paper investigates a recursive formulation of auto-regressive multi-fidelity Gaussian process regression in the challenging setting of noisy and non-nested high- and low-fidelity data. We propose a decoupled optimization strategy based…
The solar spectral irradiance (SSI) depicts the spectral distribution of solar energy flux reaching the top of the Earth's atmosphere. Daily SSI measurements constitute a matrix with spectrally (rows) and temporally (columns) resolved solar…
Traditionally, studies in experimental physiology have been conducted in small groups of human participants, animal models or cell lines. Identifying optimal study designs that achieve sufficient power for drawing proper statistical…
Risk management is an important part of financial practice, essential for protecting assets and investments in modern-day volatile markets. This paper proposes a mixture of mirrored Weibull (MMW) distribution for modelling stock returns and…
Testing for Hardy-Weinberg equilibrium (HWE) is a fundamental component of genetic data analysis, widely used for quality control and model validation. Although HWE testing is well established for autosomal loci, inference on the X…
Can researchers use local open-weight models instead of commercial APIs for LLM text classification? Local models avoid marginal API charges, keep data on the researcher's machine, and make exact model versions easier to preserve. I…
Physical activity (PA) plays an important role in maintaining and improving health. Daily steps have been a key PA measure that is easily accessible with common wearable devices. However, methods are lacking to recommend a personalized…