应用统计
Family planning is a global development priority and a key indicator of reproductive health. Monitoring progress is challenged by gaps in survey data across countries. The United Nations Population Division addresses this with the Family…
High temperatures are associated with adverse respiratory health outcomes and increases in ambient air pollution. Limited research has quantified air pollution's mediating role in the relationship between temperature and respiratory…
This study addresses a central tactical dilemma for football coaches: whether to employ a defensive strategy, colloquially known as "parking the bus", or a more offensive one. Using an advanced Double Machine Learning (DML) framework, this…
We study the problem of selecting a subset of patients who are unlikely to experience an adverse event within a fixed time horizon by calibrating a screening rule based on a black-box survival model. We consider two complementary,…
A major public health issue is the growing resistance of bacteria to antibiotics. An important part of the needed response is the discovery and development of new antimicrobial strategies. These require the screening of potential new drugs,…
Low physical activity is a known risk factor for major depressive disorder (MDD), but changes in activity before a first clinical diagnosis remain unclear, especially using long-term objective measurements. This study characterized…
Online media platforms often need to measure how frequently users are exposed to specific content attributes in order to evaluate trade-offs in A/B experiments. A direct approach is to sample content, label it using a high-quality rubric…
The chain-ladder (CL) method is the most widely used claims reserving technique in non-life insurance. This manuscript introduces a novel approach to computing the CL reserves based on a fundamental restructuring of the data utilization for…
Air quality and climate are major issues in Italian society and lie at the intersection of many research fields, including public health and policy planning. There is an increasing need for readily available, easily accessible, ready-to-use…
The present study explores a cost-effective method for using activated ground granulated blast furnace slag (GGBFS) and silica fume (SF) as cement substitutes. Instead of activating them with expensive alkali solutions, the present study…
Online platforms require robust systems to enforce content safety policies at scale. A critical component of these systems is the ability to evaluate the quality of moderation decisions made by both human agents and Large Language Models…
In 2015 the Open Science Collaboration (OSC) (Nosek et al 2015) published a highly influential paper which claimed that a large fraction of published results in the psychological sciences were not reproducible. In this article we review…
We examine static and dynamic social network structure in 176 villages within the Copan Department of Honduras across two data waves (2016, 2019), using detailed data on multiplex networks for 20,232 individuals enrolled in a longitudinal…
Katz, Savage, and Brusch propose a two-part forecasting method for sectors where event timing differs from recording time. They treat forecasting as a time-shift operation, using univariate time series for total bookings and a Bayesian…
The Highway Performance Monitoring System, managed by the Federal Highway Administration, provides data on average annual daily traffic volume across roadways in the United States, but it has limited representation of medium- and heavy-duty…
An Event-Related Potential (ERP)-based Brain-Computer Interface (BCI) Speller System assists people with disabilities to communicate by decoding electroencephalogram (EEG) signals. A P300-ERP embedded in EEG signals arises in response to a…
Individual-level epidemic models are increasingly being used to help understand the transmission dynamics of various infectious diseases. However, fitting such models to individual-level epidemic data is challenging, as we often only know…
Repeating an imperfect biomarker test based on an initial result can introduce bias and influence misclassification risk. For example, in some blood donation settings, blood donors' hemoglobin is remeasured when the initial measurement…
We systematically evaluate the reproducibility of data analysis conducted by Large Language Models (LLMs). We evaluate two prompting strategies, six models, and four temperature settings, with ten independent executions per configuration,…
The coronavirus disease 2019 (COVID-19) pandemic has caused a reduction in business and routine activity and resulted in less motor fuel consumption. Thus, the gas tax revenue is reduced which is the major funding resource supporting the…