统计方法学
Treatment policy estimands are frequently favored by regulators, as they assess the effect of treatment assignment regardless of post-randomization events. Despite best efforts, missing data due to study discontinuation cannot be fully…
We introduce a statistical framework for combining data from multiple large longitudinal cardiovascular cohorts to enable the study of long-term cardiovascular health starting in early adulthood. Using data from seven cohorts belonging to…
This article introduces a novel framework for nonparametric priors on real-valued random vectors, which can be viewed as a multivariate generalization of neutral-to-the right priors. It is based on randomizing the exponent measure of a…
This paper introduces Dirichlet process mixtures of block $g$ priors for model selection and prediction in linear models. These priors are extensions of traditional mixtures of $g$ priors that allow for differential shrinkage for various…
Cross validation is commonly used for selecting tuning parameters in penalized regression, but its use in penalized Cox regression models has received relatively little attention in the literature. Due to its partial likelihood…
Penalized regression methods, most notably the lasso, are a popular approach to analyzing high-dimensional data. An attractive property of the lasso is that it naturally performs variable selection. An important area of concern, however, is…
Context: Indirect treatment comparisons (ITC) are essential when direct head-to-head evidence is unavailable. Their reliability depends on rigorous methodological choices and careful assessment of underlying assumptions. Appropriate…
The goal of an experiment is to evaluate the profit, loss, or the amount of a physical entity over a period. The measurements $X_t$ can be influenced by the values measured in the past; hence we describe the situation with an autoregression…
This paper develops a quantitative framework to assess the robustness of Bayes-optimal decisions in finite decision problems under model uncertainty. We introduce two complementary stability notions for acts: the robustness radius,…
High-fidelity (HF) data are often expensive to collect and therefore scarce, making conditional quantiles difficult to estimate accurately. We propose a two-stage, model-agnostic method for multi-fidelity quantile regression. The central…
The simulation of physical phenomena with computer models relies on the estimation of physical and/or numerical parameters calibrated to fit experimental data. The approximations within the computer model and the errors in the measurements…
In many real-world settings such as online recommendation or consumer choice modeling, individuals make repeated choices from a fixed set of options. Accurately estimating their underlying preferences is essential for generating…
False discovery rate (FDR) is a cornerstone of modern multiple testing. However, it often fails to guarantee the reliability of "marginal" discoveries that lie at the boundary of the rejection set, which are often crucial in high-precision…
Methods that rely on proxies, without imposing strong parametric structure, are increasingly used to deal with unobserved variables in causal inference. One influential line of this work reconstructs latent distributions used to identify…
Understanding effect modification -- how treatment effects vary across subpopulations -- is practically important in observational studies, as it helps identify which subgroups are likely to benefit from a given treatment. In this paper, we…
Multi-judge evaluation is increasingly used to assess LLMs and reward models, and the prevailing heuristic is to curate: keep the most accurate judges and discard weaker ones. We show that this heuristic can reverse when the target is not…
Although spatial models for areal data are widely used in multilevel settings, the conditions under which spatial and nonspatial random effects yield equivalent posterior inference for regression coefficients have never been formally…
Marked point process data arise when events occur in a space with event-level marks. We study clustering of replicated marked Poisson point processes and introduce Dirichlet process mixtures of marked Poisson point processes, a Bayesian…
When testing a number of statistical hypotheses using data from location families, it is often useful to control the false discovery rate (FDR) not just for hypotheses of the null values but also of other parameter values that are deemed…
In Bayesian phylogenetics, our goal is to estimate the posterior distribution over phylogenetic trees. Markov chain Monte Carlo methods are widely used to approximate the phylogenetic posterior distributions. For large-scale sequence data,…