应用统计 — Scifaro

Detecting gene-environment interactions to guide personalized intervention: boosting distributional regression for polygenic scores

Polygenic risk scores can be used to model the individual genetic liability for human traits. Current methods primarily focus on modeling the mean of a phenotype neglecting the variance. However, genetic variants associated with phenotypic…

应用统计 · 统计学 2025-09-26 Qiong Wu , Hannah Klinkhammer , Kiran Kunwar , Christian Staerk , Carlo Maj , Andreas Mayr

Statistical Learning of Trade Credit Insurance Network Data with Applications to Ratemaking and Reserving

Trade credit insurance (TCI) is a specialized line of property and casualty insurance, protecting businesses against financial losses due to buyer's insolvency. Predictive modeling for TCI claims poses formidable challenges due to the…

应用统计 · 统计学 2025-09-26 Woongchae Yoo , Spark C. Tseung , Tsz Chai Fung

Inferring Piece Value in Chess and Chess Variants

We use logistic regression to estimate the value of the pieces in standard chess and several chess variants, namely Chess 960, Atomic chess, Antichess, and Horde chess. We perform our regressions on several years of data from Lichess, the…

应用统计 · 统计学 2025-09-26 Steven Pav

reslr: An R package for relative sea level modelling

We present reslr, an R package to perform Bayesian modelling of relative sea level data. We include a variety of different statistical models previously proposed in the literature, with a unifying framework for loading data, fitting models,…

应用统计 · 统计学 2025-09-26 Maeve Upton , Andrew Parnell , Niamh Cahill

A noisy-input generalised additive model for relative sea-level change along the Atlantic coast of North America

We propose a Bayesian, noisy-input, spatial-temporal generalised additive model to examine regional relative sea-level (RSL) changes over time. The model provides probabilistic estimates of component drivers of regional RSL change via the…

应用统计 · 统计学 2025-09-26 Maeve Upton , Andrew Parnell , Andrew Kemp , Erica Ashe , Gerard McCarthy , Niamh Cahill

Quality-Ensured In-Situ Process Monitoring with Deep Canonical Correlation Analysis

This paper proposes a deep learning-based approach for in-situ process monitoring that captures nonlinear relationships between in-control high-dimensional process signature signals and offline product quality data. Specifically, we…

应用统计 · 统计学 2025-09-25 Xiaoyang Song , Wenbo Sun , Metin Kayitmazbatir , Jionghua , Jin

One Person, How Many Votes? Demographic Distortions in United States Elections

Representative democracy in the United States relies on election systems that transmit votes into representatives in three key bodies: the two chambers of the federal legislature (House of Representatives and Senate) and the Electoral…

应用统计 · 统计学 2025-09-25 Lee Kennedy-Shaffer

Three Distributional Approaches for PM10 Assessment in Northern Italy

We propose three spatial methods for estimating the full probability distribution of PM10 concentrations, with the ultimate goal of assessing air quality in Northern Italy. Moving beyond spatial averages and simple indicators, we adopt a…

应用统计 · 统计学 2025-09-25 Marco F. De Sanctis , Andrea Gilardi , Giacomo Milan , Laura M. Sangalli , Francesca Ieva , Piercesare Secchi

A Variance Decomposition Approach to Inconclusives in Forensic Black Box Studies

In the US, `black box' studies are increasingly being used to estimate the error rate of forensic disciplines. A sample of forensic examiner participants are asked to evaluate a set of items whose source is known to the researchers but not…

应用统计 · 统计学 2025-09-25 Amanda Luby , Joseph B. Kadane

Estimating the Heritability of Longitudinal Rate-of-Change: Genetic Insights into PSA Velocity in Prostate Cancer-Free Individuals

Serum prostate-specific antigen (PSA) is widely used for prostate cancer screening. While the genetics of PSA levels has been studied to enhance screening accuracy, the genetic basis of PSA velocity, the rate of PSA change over time,…

应用统计 · 统计学 2025-09-25 Pei Zhang , Xiaoyu Wang , Jianxin Shi , Paul S. Albert

The information flow among Green Bonds exchange traded funds

This article investigates the information flow between 13 Green Bond ETFs (Exchange Traded Funds) from three global markets: the USA, Canada,and Europe, between 2021 and 2022. We used the transfer entropy and effective transfer entropy…

应用统计 · 统计学 2025-09-24 Wenderson Gomes Barbosa , Kerolly Kedma Felix do Nascimento , Fabio Sandro dos Santos , Tiago A. E. Ferreira

Hierarchical Semi-Markov Models with Duration-Aware Dynamics for Activity Sequences

Residential electricity demand at granular scales is driven by what people do and for how long. Accurately forecasting this demand for applications like microgrid management and demand response therefore requires generative models that can…

应用统计 · 统计学 2025-09-24 Rohit Dube , Natarajan Gautam , Amarnath Banerjee , Harsha Nagarajan

The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review

We conducted an experiment during the review process of the 2023 International Conference on Machine Learning (ICML), asking authors with multiple submissions to rank their papers based on perceived quality. In total, we received 1,342…

应用统计 · 统计学 2025-09-24 Buxin Su , Jiayao Zhang , Natalie Collina , Yuling Yan , Didong Li , Kyunghyun Cho , Jianqing Fan , Aaron Roth , Weijie Su

The DeepJoint algorithm: An innovative approach for studying the longitudinal evolution of quantitative mammographic density and its association with screen-detected breast cancer risk

Mammographic density is a dynamic risk factor for breast cancer and affects the sensitivity of mammography-based screening. While automated machine and deep learning-based methods provide more consistent and precise measurements compared to…

应用统计 · 统计学 2025-09-24 Manel Rakez , Julien Guillaumin , Aurelien Chick , Gaelle Coureau , Foucauld Chamming's , Pierre Fillard , Brice Amadeo , Virginie Rondeau

Bayesian Nonhomogeneous hidden Markov models to leverage routine in physical activity monitoring with informative wear time

Missing data is among the most prominent challenges in the analysis of physical activity (PA) data collected from wearable devices, with the threat of nonignorabile missingness arising when patterns of device wear relate to underlying…

应用统计 · 统计学 2025-09-23 Beatrice Cantoni , Savannah V. Rauschendorfer , Michael E. Roth , J. Andrew Livingston , Eugenie S. Kleinerman , Corwin M. Zigler

A Bayesian approach to aggregated chemical exposure assessment

Human exposure to chemicals commonly arises from multiple sources, yet traditional assessments often treat these sources in isolation, overlooking their combined impact. We introduce a Bayesian framework for aggregated chemical exposure…

应用统计 · 统计学 2025-09-23 Sophie Van Den Neucker , Alexander Grigoriev , Heidi Demaegdt , Jan Mast , Karlien Cheyns , Sofie De Broe , Roberto Cerina

ToMATo: an efficient and robust clustering algorithm for high dimensional datasets. An illustration with spike sorting

Clustering algorithms became an essential part of the neurophysiological data analysis toolbox in the last twenty five years. Many problems, from the definition of cell types/groups based on morphological, molecular and physiological data…

应用统计 · 统计学 2025-09-23 Louise Martineau , Christophe Pouzat , Ségolen Geffray

A Bayesian dawn in linguistics: Trends, benefits and good practices

In recent years, Bayesian statistics has gained traction across a wide range of scientific disciplines. This paper explores the growing application of Bayesian methods within the field of linguistics and considers their future potential. A…

应用统计 · 统计学 2025-09-23 Natalia Levshina

Efficient Brain Network Estimation with Sparse ICA in Non-Human Primate Neuroimaging

Independent component analysis (ICA) is widely used to separate mixed signals and recover statistically independent components. However, in non-human primate neuroimaging studies, most ICA-recovered spatial maps are often dense. To extract…

应用统计 · 统计学 2025-09-23 Qiang Li , Liang Ma , Masoud Seraji , Shujian Yu , Yun Wang , Jingyu Liu , Vince D. Calhoun

SynthIPD: assumption-lean synthetic individual patient data generation

Individual patient data (IPD) are essential for statistical inference in clinical research. However, privacy concerns, high data-sharing costs, and restrictive access often make IPD unavailable. Conventional synthetic data generation…

应用统计 · 统计学 2025-09-23 Zixuan Zhao , Zexin Ren , Guannan Zhai , Feifang Hu , Will Ma , En Xie , Qian Shi