应用统计
Objectives: Prior event rate ratio (PERR) is a method shown to perform well in mitigating confounding in real-world evidence research but it depends on several model assumptions. We propose an analytic strategy to correct biases arising…
Variables contained within the global oceans can detect and reveal the effects of the warming climate as the oceans absorb huge amounts of solar energy. Hence, information regarding the joint spatial distribution of ocean variables is…
Background: The E-value has become widely used for assessing robustness to unmeasured confounding in observational studies, but the original framework was developed for single time-point exposure-outcome settings. This study extends the…
Estimation of the population total of a variable can be improved by calibration on a set of auxiliary variables. It is difficult to establish that such a set of variables is sufficient, that estimation could not be improved by calibration…
When it is not feasible to conduct randomized controlled trials (RCTs), the use of external control arms based on real-world data (RWD) may be a viable option. However, challenges arising from data heterogeneity must be addressed to ensure…
Multiple long-term conditions (MLTC) are increasingly observed in clinical practice globally. Clustering methods to group diseases into commonly co-occurring clusters have been of interest for further understanding of how MLTC group…
For three Winter Olympics in a row, tiny nation Norway has out-medalled everyone else, in 2026 winning 18 golds, 12 silvers, 11 bronzes, i.e.~41 medals, compared to e.g.~12 + 12 + 9 = 33 for the USA, 10 + 6 + 14 = 30 for home team Italy, 8…
Predictive hotspot mapping is an important problem in crime prediction and control. An accurate hotspot mapping helps in appropriately targeting the available resources to manage crime in cities. With an aim to make data-driven decisions…
Ambient air pollution measurements from regulatory monitoring networks are routinely used to support epidemiologic studies and environmental policy decision making. However, regulatory monitors are spatially sparse and preferentially…
The Kaplan-Meier estimate, also known as the product-limit method (PLM), is a widely used non-parametric maximum likelihood estimator (MLE) in survival analysis. In the context of highway engineering, it has been repeatedly applied to…
Payments in parametric insurance solutions are linked to an index and thus decoupled from policyholders' true losses. While this principle has appealing operational benefits compared to traditional indemnity coverage, i.e. is very efficient…
This paper analyzes the dynamics of higher education dropouts through an innovative approach that integrates recurrent events modeling and point process theory with functional data analysis. We propose a novel methodology that extends…
This paper explains how the synthpop package for R has been extended to include functions to calculate measures of identity and attribute disclosure risk for synthetic data that measure risks for the records used to create the synthetic…
Evaluating sports players based on their performance shares core challenges with evaluating healthcare providers based on patient outcomes. Drawing on recent advances in healthcare provider profiling, we cast sports player evaluation within…
Aggregation constraints, arising from geographical or sectoral division, frequently emerge in a large set of time series. Coherent forecasts of these constrained series are anticipated to conform to their hierarchical structure organized by…
Mutational signature analysis has emerged as a powerful method for uncovering the underlying biological processes driving cancer development. However, the signature extraction process, typically performed using non-negative matrix…
Ambient air pollution poses significant health and environmental challenges. Exposure to high concentrations of PM$_{2.5}$ have been linked to increased respiratory and cardiovascular hospital admissions, more emergency department visits…
Electronic health records (EHR) data have considerable variability in data completeness across sites and patients. Lack of "EHR data-continuity" or "EHR data-discontinuity", defined as "having medical information recorded outside the reach…
Traditional health authority approval for oncology drugs is based on a clinical benefit endpoint, or a valid surrogate. In 1992 the FDA created the Accelerated Approval pathway to allow for earlier approval of therapies in serious…
Effective methods for visualizing data involving multiple variables, including categorical ones, are limited. The hammock plot (Schonlau 2003) visualizes both categorical and numerical variables using parallel coordinates. We introduce the…