应用统计
The study on shared micro-mobility is based on trip modeling and user data. User segmentation in shared micromobility systems is traditionally studied by aggregating trip-level observations into user-specific summary measures before…
Precipitation forecasts are judged by accuracy, but the decisions they support -- when to restrict water, when to warn of drought -- turn on noticing when a local regime is becoming abnormal, which forecast scores alone do not reveal. We…
In this paper, we look at the relationships that economic variables have with adverse health outcomes in the western counties of Washington, Idaho, Oregon, California, and Nevada, with specific emphasis on how suicide rate relates to such…
In many service systems, an estimation of customers' waiting times for the service can assist in decision making focused on enhancing the operational efficiency, improving the customers' experience, and ensuring efficient resource…
Accurate modeling of wind turbine power curves is crucial for optimal wind farm operation. Nearly all existing power curve models focus on temporal variables such as wind speed and temperature while overlooking the influence of terrain…
Argo profiling floats measure seawater temperature and salinity in the upper 2000 meters of the ocean. These floats are uniquely capable of measuring the global Ocean Heat Content (OHC), a quantity that is of central importance for…
Protein-protein interaction (PPI) networks, estimated from high-throughput omics data, foster biomarker discovery and precision medicine. Gaussian graphical models (GGMs) offer a principled reconstruction framework. Yet, existing…
Ranked choice voting (RCV) is a popular alternative voting method in which voters are asked to list their favored candidates in preference order, rather than vote for a single candidate. When these ballots are tabulated, candidates are…
The U.S.\ Census Bureau's Low Response Score (LRS) is a central planning instrument for identifying places likely to require additional self-response outreach and nonresponse follow-up. The published LRS is intentionally interpretable: it…
Large language models (LLMs) are interactive stochastic systems whose most consequential behaviors are still only partially understood. This discussion argues that statistics curricula should treat LLMs not only as tools, but as objects of…
Inferring the direction of a gene-regulatory relationship is harder than inferring whether a relationship exists, and most direction-inference methods are validated mainly on a single in silico benchmark. We ask which method remains…
The paper "Use of roster charts in the investigation and prosecution of nurses suspected of inflicting deliberate harm on patients" by Prof. John O'Quigley explores an interesting hypothesis concerning statistical information hidden in the…
Classical actuarial pricing models, such as the generalized linear model, are valued for transparency and ease of governance, but they use interactions among risk factors only when these are supplied through explicit feature engineering. We…
Rubin multiple imputation (MI) generates plausible data completions to account for uncertainty and statistical variability but provides little insight into their global organization. We introduce a topological reconstruction approach that…
Biomedical research is increasingly relying on readily available routine data, such as electronic health records. Routinely collected data, as well as datasets from large cohorts, are often prone to measurement error which, if not addressed…
Fractional Brownian motion has been widely used in financial modeling to capture long-range dependence and persistent behavior observed in asset dynamics. In the fractional Black--Scholes framework, accurate estimation of the Hurst…
This paper presents push puppet networks, a novel Bayesian algorithm for structured pruning of large language models. The push puppet network learns a hierarchical function during training that can optimally determine specific network…
The coupon incentive is one of the most common tools marketers use to court users to engage with a business at various stages of the customer life cycle. A variety of factors can affect the effectiveness of a coupon incentive on users,…
The airborne fraction is the share of anthropogenic carbon dioxide emissions that remains in the atmosphere and is a key indicator of carbon-cycle response and remaining carbon budgets under continued emissions. Whether this share is rising…
Large language models (LLMs) are increasingly used in statistical research and applications. However,they are also notorious for unreliable or biased information. Here, we explore whether LLMs can be used to improve the precision of…