应用统计
Statistical modelling of covariate distributions allows to generate virtual populations or to impute missing values in a covariate dataset. Covariate distributions typically have non-Gaussian margins and show nonlinear correlation…
Acts of political violence in the continental United States have increased dramatically in the last decade. For this rise in political violence, we are interested in where and when such incidents occur: how are the locations and times of…
Confidential administrative data is usually only available to researchers within a trusted research environment (TRE). Recently, some UK groups have proposed that low-fidelity synthetic data (LFSD) is available to researchers outside the…
With advances in high-resolution mass spectrometry technologies, metabolomics data are increasingly used to investigate biological mechanisms underlying associations between exposures and health outcomes in clinical and epidemiological…
In many biomedical applications with high-dimensional features, such as single-cell RNA-sequencing, it is not uncommon to observe numerous structural zeros. Identifying important features from a pool of high-dimensional data for subsequent…
Forcing someone into marriage against their will is a violation of their human rights. In 2021, the county of Nottinghamshire, UK, launched a strategy to tackle forced marriage and violence against women and girls. However, accessing…
This article introduces a non-parametric information-theoretic approach to inference about the tail of a continuous or a discrete distribution. Leveraging a new concept named tail profile -- a set of information-theoretic quantities…
Bayesian optimal experiments that maximize the information gained from collected data are critical to efficiently identify behavioral models. We extend a seminal method for designing Bayesian optimal experiments by introducing two…
Maximal strength increases with body weight, this is why scoring methods have been developed in order to fairly scale powerlifting performances based on athletes' body weight. The International Powerlifting Federation (IPF) Good Lift (GL)…
Several data-driven approaches based on information theory have been proposed for analyzing high-order interactions involving three or more components of a network system. Most of these methods are defined only in the time domain and rely…
Air pollution remains a critical environmental and public health challenge, demanding high-resolution spatial data to better understand its spatial distribution and impacts. This study addresses the challenges of conducting multivariate…
In this study, we propose a novel application of spatiotemporal clustering in the environmental sciences, with a particular focus on regionalised time series of greenhouse gases (GHGs) emissions from a range of economic sectors. Utilising a…
Gas sampling methods have been crucial for the advancement of combustion science, enabling analysis of reaction kinetics and pollutant formation. However, the measured composition can deviate from the true one because of the potential…
In the present paper, the author discusses the Generalized Odd Median Base Unit Rayleigh (GOMBUR) in relation to the Median Based Unit Rayleigh (MBUR) to evaluate the additive value of the new shape parameter on the estimation process as…
Accurate estimation of meal macronutrient composition is a pre-perquisite for precision nutrition, metabolic health monitoring, and glycemic management. Traditional dietary assessment methods, such as self-reported food logs or diet recalls…
In this paper, the author presents the generalized form of the Median-Based Unit Rayleigh (MBUR) distribution, a novel statistical distribution that is specifically defined within the interval (0, 1) expressing oscillating hazard rate…
Tensor-based morphometry (TBM) aims at showing local differences in brain volumes with respect to a common template. TBM images are smooth but they exhibit (especially in diseased groups) higher values in some brain regions called lateral…
Recent studies have shown associations between redlining policies (1935-1974) and present-day fine particulate matter (PM$_{2.5}$) and nitrogen dioxide (NO$_2$) air pollution concentrations. In this paper, we reevaluate these associations…
This paper discusses the effect of measurement errors in the estimation of the carbon dioxide (CO$_2$) airborne fraction. We are the first to present regression-based estimates and standard errors that are robust to measurement errors for…
In high-throughput screenings, it is common to estimate the effects of many treatments using a small number of independent trials of each. Because little is known about the distributional properties of the measurements from these trials, it…