应用统计
Greenhouse decisions often rely on static thresholds, yet crop output switches among microclimate-driven regimes. We frame daily cucumber yield as transitions among three ordered states and fit a continuous-time, covariate-dependent…
Heavy-tailed probability distributions are extremely useful and play a crucial role in modeling different types of financial data sets. This study presents a two-pronged methodology. First, a mixture probability distribution is created by…
This study applies Multiscale Geographically Weighted Regression (MGWR) to examine the spatial determinants of household wealth in Bernalillo County, New Mexico. The model incorporates sociodemographic, environmental, and proximity-based…
Understanding and modeling mortality patterns, especially differences in mortality rates between populations, is vital for demographic analysis and public health planning. We compare three statistical models within the age-period framework…
A probabilistic clustering algorithm is proposed for the analysis of forensic DNA mixtures in which individual cells are isolated and short tandem repeats are amplified using the polymerase chain reaction to generate single cell…
Fault detection is essential in complex industrial systems to prevent failures and optimize performance by distinguishing abnormal from normal operating conditions. With the growing availability of condition monitoring data, data-driven…
In a scenario of growing usage of park-and-ride facilities, understanding and predicting car park occupancy is becoming increasingly important. This study presents a model that effectively captures the occupancy patterns of park-and-ride…
Motivation: Mendelian randomization (MR) infers causal relationships between exposures and outcomes using genetic variants as instrumental variables. Typically, MR considers only a pair of exposure and outcome at a time, limiting its…
Objectives: We propose a novel imputation method tailored for Electronic Health Records (EHRs) with structured and sporadic missingness. Such missingness frequently arises in the integration of heterogeneous EHR datasets for downstream…
Gene expression levels, hormone secretion, and internal body temperature each oscillate over an approximately 24-hour cycle, or display circadian rhythms. Many circadian biology studies have investigated how these rhythms vary across…
Predicting species distributions using occupancy models accounting for imperfect detection is now commonplace in ecology. Recently, modelling spatial and temporal autocorrelation was proposed to alleviate the lack of replication in…
Class imbalance is a pervasive problem in predictive toxicology, where the number of non-toxic compounds often exceeds the number of toxic ones. Models trained on such data often perform well on the majority class but poorly on the minority…
The spatial-temporal imbalance between supply and demand in shared micro-mobility services often leads to observed demand being censored, resulting in incomplete records of the underlying real demand. This phenomenon undermines the…
Most capture-recapture models assume that individuals either do not emigrate or emigrate permanently from the sampling area during the sampling period. This assumption is violated when individuals temporarily leave the sampling area and…
The 2024 July Revolution in Bangladesh represents a landmark event in the study of civil resistance. This study investigates the central paradox of the success of this student-led civilian uprising: how state violence, intended to quell…
Missing data in financial panels presents a critical obstacle, undermining asset-pricing models and reducing the effectiveness of investment strategies. Such panels are often inherently multi-dimensional, spanning firms, time, and financial…
Italy reports some of the lowest levels of mortality in the developed world. Recent evidence, however, suggests that even in low mortality countries improvements may be slowing and regional inequalities widening. This study contributes new…
Atrial fibrillation (AF) is a common cardiac arrhythmia characterised by disordered electrical activity in the atria. The standard treatment is catheter ablation, which is invasive and irreversible. Recent advances in computational…
In this paper, we first situate the challenges for measuring data quality under Project Lighthouse in the broader academic context. We then discuss in detail the three core data quality metrics we use for measurement--two of which extend…
Understanding the dependence structure of asset returns is fundamental in risk assessment and is particularly relevant in a portfolio diversification strategy. We propose a clustering approach where evidence accumulated in a multiplicity of…