应用统计
Mapping deprivation in urban areas is important, for example for identifying areas of greatest need and planning interventions. Traditional ways of obtaining deprivation estimates are based on either census or household survey data, which…
Bayesian hierarchical models are proposed for modeling tropical cyclone characteristics and their damage potential in the Atlantic basin. We model the joint probability distribution of tropical cyclone characteristics and their damage…
Understanding how housing prices respond to spatial accessibility, structural attributes, and typological distinctions is central to contemporary urban research and policy. In cities marked by affordability stress and market segmentation,…
We estimate the effect of playing in one's home country in professional squash using a Bayesian hierarchical model applied to men's and women's Professional Squash Association matches from 2018-2024. The model incorporates players' world…
The Yau-Yau nonlinear filter has increasingly emerged as a powerful tool to study stochastic complex systems. To leverage it to a wider spectrum of application scenarios, we pack the Yau-Yau filtering ALgorithms (YauYauAL) into a package of…
Accurately estimating latent velocity vector fields of atmospheric winds is crucial for understanding weather phenomena. Direct measurement of atmospheric winds is costly, especially in the upper atmosphere, so researchers attempt to…
Uncertainty reduction is vital for improving system reliability and reducing risks. To identify the best target for uncertainty reduction, uncertainty importance measure is commonly used to prioritize the significance of input variable…
Global sensitivity analysis (GSA) can provide rich information for controlling output uncertainty. In practical applications, segmented models are commonly used to describe an abrupt model change. For segmented models, the complicated…
A reasonable description of the degradation process is essential for credible reliability assessment in accelerated degradation testing. Existing methods usually use Markovian stochastic processes to describe the degradation process.…
In epidemiological studies, zero-inflated and hurdle models are commonly used to handle excess zeros in reported infectious disease cases. However, they can not model the persistence (changing from presence to presence) and reemergence…
This study introduces the Exponentiated-Exponential-Pareto-Half Normal Mixture Distribution (EEPHND), a novel hybrid model developed to overcome the limitations of classical distributions in modeling complex real-world data. By compounding…
Bayesian Improved Surname Geocoding (BISG) is a ubiquitous tool for predicting race and ethnicity using an individual's geolocation and surname. Here we demonstrate that statistical dependence of surname and geolocation within racial/ethnic…
In this work, we introduce an extremely general model for a collection of innovation processes in order to model and analyze the interaction among them. We provide theoretical results, analytically proven, and we show how the proposed model…
Waterfall plots are a key tool in early phase oncology clinical studies for visualizing individual patients' tumor size changes and provide efficacy assessment. However, comparing waterfall plots from ongoing studies with limited follow-up…
Electronic health records (EHR) often contain varying levels of missing data. This study compared different imputation strategies to identify the most suitable approach for predicting central line-associated bloodstream infection (CLABSI)…
In an ecological context, panel data arise when time series measurements are made on a collection of ecological processes. Each process may correspond to a spatial location for field data, or to an experimental ecosystem in a designed…
Many scientific fields collect longitudinal count compositional data. Each observation is a multivariate count vector, where the total counts are arbitrary, and the information lies in the relative frequency of the counts. Multiple authors…
We propose a new framework that focuses on on-site entities in the digital twin, a pairing of the real world and digital space. Characteristics include active sensing to generate event logs, spatial and temporal partitioning of complex…
Tennis is one of the world's biggest and most popular sports. Multiple researchers have, with limited success, modeled the outcome of matches using probability modelling or machine learning approaches. The approach presented here predicts…
In this study, machine learning models were tested to predict whether or not a customer of an insurance company would purchase a travel insurance product. For this purpose, secondary data provided by an open-source website that compiles…