应用统计
We propose a novel methodology relating item response theory methods with small area estimation strategies in the presence of missing data. Specifically, we propose an unbiased estimator for the average ability parameter of three-parameter…
It is still largely unclear to what extent bettors update their prior assumptions about the strength and form of competing teams considering the dynamics during the match. This is of interest not only from the psychological perspective, but…
This paper proposes a uniqueness Shapley measure to compare the extent to which different variables are able to identify a subject. Revealing the value of a variable on subject $t$ shrinks the set of possible subjects that $t$ could be. The…
Democracies employ elections at various scales to select officials at the corresponding levels of administration. The geographical distribution of political opinion, the policy issues delegated to each level, and the multilevel interactions…
Smartphone-based earthquake early warning systems implemented by citizen science initiatives are characterized by a significant variability in their smartphone network geometry. This has an direct impact on the earthquake detection…
Epidemiologic and medical studies often rely on evaluators to obtain measurements of exposures or outcomes for study participants, and valid estimates of associations depends on the quality of data. Even though statistical methods have been…
Real-time bidding has transformed the digital advertising landscape, allowing companies to buy website advertising space in a matter of milliseconds in the time it takes a webpage to load. Joint research between Cardiff University and…
Non-Negative Matrix Factorization (NMF) is a widely used dimension reduction method that factorizes a non-negative data matrix into two lower dimensional non-negative matrices: One is the basis or feature matrix which consists of the…
Aggregated relational data (ARD), formed from "How many X's do you know?" questions, is a powerful tool for learning important network characteristics with incomplete network data. Compared to traditional survey methods, ARD is attractive…
Bayesian model updating based on Gaussian Process (GP) models has received attention in recent years, which incorporates kernel-based GPs to provide enhanced fidelity response predictions. Although most kernel functions provide high fitting…
Dynamic stochastic general equilibrium (DSGE) models have been an ubiquitous, and controversial, part of macroeconomics for decades. In this paper, we approach DSGEs purely as statstical models. We do this by applying two common model…
Estimating forest AGB at large scales and fine spatial resolutions has become increasingly important for greenhouse gas accounting, monitoring, and verification efforts to mitigate climate change. Airborne LiDAR is highly valuable for…
This paper details the approach of the team $\textit{Kohrrelation}$ in the 2021 Extreme Value Analysis data challenge, dealing with the prediction of wildfire counts and sizes over the contiguous US. Our approach uses ideas from…
Tropical cyclones present a serious threat to many coastal communities around the world. Many numerical weather prediction models provide deterministic forecasts with limited measures of their forecast uncertainty. Standard postprocessing…
The biogeochemical complexity of environmental models is increasing continuously and model reliability must be reanalysed when new implementations are brought about. This work aim to identify influential biogeochemical parameters that…
Online experimentation platforms collect user feedback at low cost and large scale. Some systems even support real-time or near real-time data processing, and can update metrics and statistics continuously. Many commonly used metrics, such…
Researchers are often faced with evaluating the effect of a policy or program that was simultaneously initiated across an entire population of units at a single point in time, and its effects over the targeted population can manifest at any…
This survey provides an overview of common applications, both implicit and explicit, of "tensors" and "tensor products" in the fields of data science and statistics. One goal is to reconcile seemingly distinct usages of the term "tensor" in…
This paper addresses the problem of fault diagnosis in multistation assembly systems. Fault diagnosis is to identify process faults that cause the excessive dimensional variation of the product using dimensional measurements. For such…
Interval-valued data receives much attention due to its wide applications in the fields of finance, econometrics, meteorology and medicine. However, most regression models developed for interval-valued data assume observations are mutually…