应用统计
Polygenic risk scores can be used to model the individual genetic liability for human traits. Current methods primarily focus on modeling the mean of a phenotype neglecting the variance. However, genetic variants associated with phenotypic…
Trade credit insurance (TCI) is a specialized line of property and casualty insurance, protecting businesses against financial losses due to buyer's insolvency. Predictive modeling for TCI claims poses formidable challenges due to the…
We use logistic regression to estimate the value of the pieces in standard chess and several chess variants, namely Chess 960, Atomic chess, Antichess, and Horde chess. We perform our regressions on several years of data from Lichess, the…
We present reslr, an R package to perform Bayesian modelling of relative sea level data. We include a variety of different statistical models previously proposed in the literature, with a unifying framework for loading data, fitting models,…
We propose a Bayesian, noisy-input, spatial-temporal generalised additive model to examine regional relative sea-level (RSL) changes over time. The model provides probabilistic estimates of component drivers of regional RSL change via the…
This paper proposes a deep learning-based approach for in-situ process monitoring that captures nonlinear relationships between in-control high-dimensional process signature signals and offline product quality data. Specifically, we…
Representative democracy in the United States relies on election systems that transmit votes into representatives in three key bodies: the two chambers of the federal legislature (House of Representatives and Senate) and the Electoral…
We propose three spatial methods for estimating the full probability distribution of PM10 concentrations, with the ultimate goal of assessing air quality in Northern Italy. Moving beyond spatial averages and simple indicators, we adopt a…
In the US, `black box' studies are increasingly being used to estimate the error rate of forensic disciplines. A sample of forensic examiner participants are asked to evaluate a set of items whose source is known to the researchers but not…
Serum prostate-specific antigen (PSA) is widely used for prostate cancer screening. While the genetics of PSA levels has been studied to enhance screening accuracy, the genetic basis of PSA velocity, the rate of PSA change over time,…
This article investigates the information flow between 13 Green Bond ETFs (Exchange Traded Funds) from three global markets: the USA, Canada,and Europe, between 2021 and 2022. We used the transfer entropy and effective transfer entropy…
Residential electricity demand at granular scales is driven by what people do and for how long. Accurately forecasting this demand for applications like microgrid management and demand response therefore requires generative models that can…
We conducted an experiment during the review process of the 2023 International Conference on Machine Learning (ICML), asking authors with multiple submissions to rank their papers based on perceived quality. In total, we received 1,342…
Mammographic density is a dynamic risk factor for breast cancer and affects the sensitivity of mammography-based screening. While automated machine and deep learning-based methods provide more consistent and precise measurements compared to…
Missing data is among the most prominent challenges in the analysis of physical activity (PA) data collected from wearable devices, with the threat of nonignorabile missingness arising when patterns of device wear relate to underlying…
Human exposure to chemicals commonly arises from multiple sources, yet traditional assessments often treat these sources in isolation, overlooking their combined impact. We introduce a Bayesian framework for aggregated chemical exposure…
Clustering algorithms became an essential part of the neurophysiological data analysis toolbox in the last twenty five years. Many problems, from the definition of cell types/groups based on morphological, molecular and physiological data…
In recent years, Bayesian statistics has gained traction across a wide range of scientific disciplines. This paper explores the growing application of Bayesian methods within the field of linguistics and considers their future potential. A…
Independent component analysis (ICA) is widely used to separate mixed signals and recover statistically independent components. However, in non-human primate neuroimaging studies, most ICA-recovered spatial maps are often dense. To extract…
Individual patient data (IPD) are essential for statistical inference in clinical research. However, privacy concerns, high data-sharing costs, and restrictive access often make IPD unavailable. Conventional synthetic data generation…