应用统计
Ideal point estimation methods face a significant challenge when legislators engage in protest voting -- strategically voting against their party to express dissatisfaction. Such votes introduce attenuation bias, making ideologically…
Electoral boundaries in Malaysia are not publicly available in machine-readable form. This prevents rigorous analysis of geography-centric issues such as malapportionment and gerrymandering, and constrains spatial perspectives on electoral…
To what extent are users surveilled on the web, by what technologies, and by whom? We answer these questions by combining passively observed, anonymized browsing data of a large, representative sample of Americans with domain-level data on…
The Dynamic Nelson--Siegel (DNS) model is a widely used framework for term structure forecasting. We propose a novel extension that models DNS residuals as a Gaussian random field, capturing dependence across both time and maturity. The…
This study innovates geometric morphometrics by incorporating functional data analysis, the square-root velocity function (SRVF), and arc-length parameterisation for 3D morphometric data, leading to the development of seven new pipelines in…
Electronic Health Records (EHRs) contain extensive patient information that can inform downstream clinical decisions, such as mortality prediction, disease phenotyping, and disease onset prediction. A key challenge in EHR data analysis is…
The Population Stability Index (PSI) is a widely used measure in credit risk modeling and monitoring within the banking industry. Its purpose is to monitor for changes in the population underlying a model, such as a scorecard, to ensure…
Cold standby 1-out-of-n redundant systems are well-established models in system reliability engineering. To date, reliability analyses of such systems have predominantly assumed exponential, Erlang, or Weibull failure distributions for…
As AI systems are increasingly used to guide decisions, it is essential that they follow ethical principles. A core principle in medicine is non-maleficence, often equated with ``do no harm''. A formal definition of harm based on…
This article develops a Bayesian hierarchical framework to analyze academic performance in the 2022 second semester Saber 11 examination in Colombia. Our approach combines multilevel regression with municipal and departmental spatial random…
Rainfall forecasting plays a critical role in climate adaptation, agriculture, and water resource management. This study develops long-term forecasts of monthly rainfall across 19 districts of West Bengal using a century-scale dataset…
Integrating diverse data sources offers a comprehensive view of patient health and holds potential for improving clinical decision-making. In Cystic Fibrosis (CF), which is a genetic disorder primarily affecting the lungs, biomarkers that…
Univariate zero-inflated models are increasingly being used to account for excess zeros in spatio-temporal infectious disease counts. However, the multivariate case is challenging due to the need to account for correlations across space,…
Forecasting conflict-related fatalities remains a central challenge in political science and policy analysis due to the sparse, bursty, and highly non-stationary nature of violence data. We introduce DynAttn, an interpretable…
Algorithmic lending has transformed the consumer credit landscape, with complex machine learning models now commonly used to make or assist underwriting decisions. To comply with fair lending laws, these algorithms typically exclude legally…
In data-sparse regions, satellite and reanalysis rainfall estimates (SREs) are vital but limited by inherent biases. This study evaluates bias correction (BC) methods, including traditional statistical (LOCI, QM) and machine learning (SVR,…
Behavioral risk factors, i.e., smoking, poor nutrition, alcohol misuse, and physical inactivity (SNAP), are leading contributors to chronic diseases and healthcare costs worldwide. Their prevalence is shaped %not only by demographic…
The global impetus for extracting rare earth elements (REEs) is shaping the future of green technologies. From high-efficiency magnets in wind turbines to advanced batteries and solar photovoltaics, REEs are indispensable for a greener…
We propose an ARIMA-TX-GARCH model and use it to forecast European Carbon Emission Allowance futures prices, incorporating Brent crude oil futures prices as an exogenous variable.
Occupational data play a vital role in research, official statistics, and policymaking, yet their collection and accurate classification remain a challenge. This study investigates the effects of occupational question wording on data…