统计方法学
E-variables enable safe and anytime-valid inference, with log-optimal e-variables given by the likelihood ratio of the least favorable distributions (LFDs) when they exist in composite settings. While this unconstrained theory is well…
In observational studies, accurately characterizing variance is critical for sample size determination, yet unaccounted-for variability from propensity score estimation and the resulting weights limit the accuracy of standard variance…
Causal discovery in multivariate extremes is challenging because extreme observations are sparse, dependent, and often affected by latent common shocks. Existing approaches focus on undirected extremal dependence, require prior graph…
Bayes factor sensitivity analysis examines how the evidence for one hypothesis over another depends on the prior distribution. In complex models, the standard approach refits the model at each hyper-parameter value, and the total…
Clustering multivariate binary data is of interest in many scientific fields, including ecology, biomedicine, and social policy. Beyond heuristic clustering algorithms, such data can be modelled using multivariate Bernoulli mixture models.…
Spatial orientation is a fundamental cognitive skill that relies on sensory information to update perceived direction. Understanding how sensory conditions influence directional accuracy is important for both cognitive science and the…
Logistic regression is widely used to model the propensity score in the analysis of nonignorable missing data. However, goodness-of-fit testing for this propensity score model has received limited attention in the literature. In this paper,…
A surrogate marker is a biomarker or other physical measurement used to replace a primary outcome in clinical trials to evaluate a treatment effect when the primary outcome of interest is costly, invasive, or takes a long time to observe.…
Variable selection in linear regression has been a central topic in statistical research for decades. Bayesian variable selection methods, which account for uncertainty in both the regression coefficients and the noise variance, have…
In many spatial and spatial-temporal models, and more generally in models with complex dependencies, it may be too difficult to carry out full maximum likelihood (ML) analysis. Remedies include the use of pseudo-likelihood (PL) and…
Multivariate Pearson diffusions are characterized by a linear drift and a diffusion matrix that is quadratic in the state variables. We derive closed-form expressions for the mean and covariance matrix of this class using matrix exponential…
Small area estimation (SAE) produces estimates of population parameters for geographic and demographic subgroups with limited sample sizes. Such estimates are critical for informing policy decisions, ranging from poverty mapping to social…
Conformal prediction provides rigorous distribution-free finite-sample guarantees for marginal coverage under the assumption of exchangeability, but may exhibit systematic undercoverage or overcoverage for specific subpopulations. Assessing…
Regression with compositional responses is challenging due to the nonlinear geometry of the simplex and the limitations of Euclidean methods. We propose a regression framework for manifold-valued data based on mappings to statistically…
We present balnet, an R package for scalable pathwise estimation of covariate balancing propensity scores via logistic covariate balancing loss functions. Regularization paths are computed with Yang and Hastie (2024)'s generic elastic net…
Ranking geographical or administrative units, such as countries or states, is a well-known approach for comparing developmental progress and informing evidence-based policymaking. Existing ranking methodologies typically rely on a single…
Evidence-informed policy on infections requires estimates of their effects on health. However, pathogenic variation, whereby occurrence of adverse outcomes depends on the infecting strain, might complicate the study of many infectious…
Large Language Models (LLMs) are increasingly used to automate classification tasks in business, such as analyzing customer satisfaction from text. However, the inherent stochasticity of LLMs can create measurement error when the outcome is…
In randomized controlled trials (RCTs) of infectious disease interventions, it is well recognized that unmeasured individual heterogeneity at baseline can induce selection bias over time, thereby complicating the interpretation of the…
We propose a test of the conditional independence of random variables $X$ and~$Y$ given~$Z$ under the additional assumption that $X$ is stochastically nondecreasing in~$Z$. The well-documented hardness of testing conditional independence…