应用统计
This paper proposes a new family of Tweedie-based ratemaking models that explicitly account for mid-term policy cancellations. Using an automobile insurance dataset from a Canadian insurer, we document a marked difference in claims…
In this paper, we propose a method to analyze correlations in pandemic-related data across different geographical regions, relying on the analysis of correlations for non-stationary time series, which are typical of pandemic data. Unlike…
When faced with new data, we often conduct a cluster analysis to obtain a better understanding of the data's structure and the archetypical samples present in the data. This process often includes visualization of the data, either as a way…
Probabilistic forecasts must sum to unity and cannot express ``I don't know.'' Possibility theory relaxes this constraint: a subnormal distribution explicitly measures how much of the plausibility budget remains unassigned, ignorance signal…
Natural and anthropogenic disturbances are impacting the health of forests worldwide. Monitoring forest disturbances at scale is important to inform conservation efforts. Here, we present a scalable approach for country-wide mapping of…
Evaluating offensive linemen and pass rushers at the player level is difficult because observable outcomes are sparse, opponent-dependent, and strongly shaped by surrounding context. Using 2021 regular-season Hudl tracking data, we…
The main purpose of this paper is to study the Dynamical behaviors of a stochastic SIS epidemic model using mean-reverting inhomogeneous geometric brownian motion process. First we demonstrate the existence of a global-in-time solution and…
Traditional tennis rating systems (e.g., Elo) summarize overall player strength but do not isolate the independent value of serving. Using point-by-point data from Wimbledon and the U.S.\ Open, we develop serve-specific player metrics that…
Real-time probability forecasts for binary outcomes are routine in sports, online experimentation, medicine, and finance. Retrospective narratives, however, often hinge on pathwise extremes: for example, a forecast that becomes "90%…
In fisheries ecology, species abundance data are often collected by multiple surveys, each with unique characteristics. This article is motivated by a dataset of Atlantic sea scallop abundance records along the northeast coast of the United…
Educational policymakers often lack data on student outcomes where standardized tests were not administered. Machine learning can predict unobserved outcomes in target populations using source population data. However, covariate…
The estimation of inequality and poverty measures is frequently constrained by a lack of individual data. Many countries, including China, continue to report income data in the form of aggregated income shares. In this context, the Beta…
System outputs in Structural Health Monitoring (SHM), such as sensor measurements or extracted features like eigenfrequencies, are influenced not only by (potential) damage but also by environmental and operational variables (EOV).…
Drug overdose mortality in the United States exhibits strong geographic heterogeneity and complex temporal evolution, yet most spatiotemporal studies focus on trends and risks without explicitly characterizing the underlying dynamical…
We propose a multivariate, distribution-free ranking framework for comparing clustered, correlated outcomes across groups, motivated by the evaluation of state-level policy environments using county-level socioeconomic data. Using pooled…
Hazard functions play a central role in survival analysis, providing insight into the underlying risk dynamics of time-to-event data, with broad applications in medicine, epidemiology, and related fields. First-order ordinary differential…
Introduction In analysis of time-to-event outcomes, a mixture cure (MC) model is preferred over a standard survival model when the sample includes individuals who will never experience the event of interest. Motivated by a cohort study of…
In this paper, we demonstrate a purely Bayesian approach for estimating within-group and between-group effect sizes for learning outcomes encountered in educational research, taking naturally into account the multilevel structure of the…
Geofencing surveillance poses a dynamic spatial sampling problem. Law enforcement must establish geofence perimeters to identify a relevant suspect. This requires identifying a sampling region around a surveillance site and counting the…
We describe a Bayesian framework for an inverse problem arising from monitoring block caving operations via muon tomography. We work with a low dimensional surface-based representation of the geometry of the block cave, which dramatically…