统计方法学
Spherical regression, in which both covariates and responses lie on the sphere, arises in many scientific applications and has attracted considerable methodological attention in recent years. Despite this progress, constructing flexible and…
Effect modification means the size of a treatment effect varies with an observed covariate. Generally speaking, a larger treatment effect with more stable error terms is less sensitive to bias. Thus, we might be able to conclude that a…
Understanding the links between diet, metabolic changes, and health outcomes is a key focus in nutritional science and broader biological research. Analyzing relationships, such as those between ultra-processed food (UPF) intake and…
Replication of scientific studies is important for assessing the credibility of their results. However, there is no consensus on how to quantify the extent to which a replication study replicates an original result. We propose a novel…
Chatterjee (2021) introduced a novel independence test that is rank-based, asymptotically normal and consistent against all alternatives. One limitation of Chatterjee's test is its low statistical power for detecting monotonic…
This paper develops a flexible distribution-free method for collective outlier detection and enumeration, designed for situations in which the presence of outliers can be detected powerfully even though their precise identification may be…
Hidden Markov models (HMMs) are characterized by an unobservable Markov chain and an observable process -- a noisy version of the hidden chain. Decoding the original signal from the noisy observations is one of the main goals in nearly all…
This note introduces FRESH (Fusion of Recent Evidence and Subject Histories), a method for incorporating population-level summary results -- published clinical trials, registry summaries, prior natural-history studies, and peer-reviewed…
Probability integral transforms (PITs) and empirical $p$-values are widely used to assess the calibration of predictive distributions. While exact PIT values are uniformly distributed under correct model specification, practical…
Recent advances in data collection technologies have led to the emergence of massive spatial datasets, with measurements obtained at millions of spatial locations. Geostatistical models typically employ Gaussian processes (GPs) to capture…
High-dimensional classification problems often rely on the Lasso-penalized linear Support Vector Machines (SVMs). However, the double non-smoothness induced by the hinge loss and Lasso penalty in this model makes statistical inference…
The correct inferential object in claims reserving is the conditional predictive distribution $p(R \mid \mathcal{D}, \hat\theta)$, where $\mathcal{D}$ is the observed triangle held fixed. We refer to this as the conditioning principle. All…
The distance dependent Chinese Restaurant Process (ddCRP) provides a flexible prior distribution for clustering observations, incorporating covariate information through pairwise distances and accommodating a rich variety of cluster…
The Chain-Ladder (CL) method remains the dominant macro-level technique for claims reserving in non-life insurance, yet its classical formulation lacks a coherent probabilistic foundation. Existing stochastic extensions-including the Mack…
In regression models fitted to data from complex survey designs, sampling weights often incorporate non-essential variation, inflating variance estimates. Stabilized weights mitigate this issue by adjusting sampling weights to account for…
Causal inference with time-to-event outcomes is fundamental in various scientific studies. In a static setup with fitted propensity scores, weighted Kaplan-Meier estimation for survival probabilities and weighted Breslow-Peto estimation for…
We propose a joint order-based scoring framework for causal structure learning of directed acyclic graph (DAG) models under heterogeneous data settings. We show that leveraging heterogeneity improves the accuracy of causal ordering…
Background: Survival prediction models are often less reliable in clinical groups with limited sample sizes or few outcome events. Target-only models may be unstable, whereas models from larger cohorts may transfer poorly when risk-factor…
Prewhitening is a common approach to deal with strong autocorrelation. In this article, we propose a new approach called tail postcoloring, motivated by it. It uses parametric models to project, or color back, the neglected tail…
Subgroup analyses within randomized controlled trials are often underpowered due to limited sample sizes. We address this challenge by leveraging trial participants outside the subgroup of interest to augment estimation within the subgroup.…