Statistics
Mathematical models are invaluable for understanding and predicting how biological systems behave, although their construction requires specifying mechanisms and relationships that are often not perfectly known. In the presence of multiple…
Binary endpoints are common in clinical trials and conditional odds ratios have traditionally been used to assess treatment effects. However, the interpretation of odds ratios is difficult, they are non-collapsible and rely on strong…
Through case studies, we demonstrate how multiverse analysis can strengthen the robustness and transparency of computational social science findings against alternative methodological decisions. We conduct multiverse analyses of three…
Inference about treatment effects for time-to-event outcomes is often obscured by the presence of competing events. A particularly complex situation arises when the treatment influences the occurrence of the competing event. A comprehensive…
Accurately assessing financial risk requires capturing both individual asset volatility and the complex, asymmetric dependence structures that emerge during extreme market events. While modern diffusion-based models have advanced…
Stochastic gradient methods are central to modern large-scale learning, but their use with incomplete covariates remains delicate since imputation schemes generally introduce systematic gradient biases, as shown for linear models. In this…
In this paper, we establish Berry-Esseen-type bounds for federated linear stochastic approximation (LSA). Our results provide the first federated Gaussian approximations for LSA that explicitly capture communication-computation trade-offs…
We investigate the asymptotic properties of the L\'evy Adaptive B-spline (LABS) regression model, a Bayesian nonparametric method that incorporates B-spline kernels into the L\'evy Adaptive Regression Kernel (LARK) model. LABS applies…
Roll-call data analysis aims to estimate legislators' ideal points and quantify the associated uncertainty. Existing approaches either rely on Bayesian methods implemented via Markov chain Monte Carlo sampling or focus primarily on point…
We study post-hoc Learning to Defer (L2D) through the lens of ideal distributions: divergence-regularized reweightings of the data distribution under which a model attains low loss. We define deferral via the density-ratio between a model's…
Linear regression is widely used to model relationships between responses and predictors. In modern applications, one encounters data where the responses are non-Euclidean random objects situated in a metric space, paired with Euclidean…
Diffusion models have achieved remarkable success in generating samples from unknown data distributions. Most popular stochastic differential equation-based diffusion models perturb the target distribution by adding Gaussian noise,…
Testing for Hardy-Weinberg equilibrium (HWE) is a fundamental component of genetic data analysis, widely used for quality control and model validation. Although HWE testing is well established for autosomal loci, inference on the X…
In complex multivariate systems, interactions among variables are defined by dependency structures, often encoded as directed acyclic graphs ($\text{DAGs}$). However, dependency structures can vary across subjects, and ignoring this…
Stochastic gradient descent (SGD) is a fundamental optimization algorithm widely used in modern machine learning. In this paper, we propose Factor-Augmented SGD (FSGD), a new optimization method that leverages latent factor representations…
Can researchers use local open-weight models instead of commercial APIs for LLM text classification? Local models avoid marginal API charges, keep data on the researcher's machine, and make exact model versions easier to preserve. I…
Rankings are central to decision-making in fields ranging from education to online platforms, yet classical deterministic methods such as the Borda count method or Copeland-type pairwise methods ignore uncertainty due to sampling noise or…
Physical activity (PA) plays an important role in maintaining and improving health. Daily steps have been a key PA measure that is easily accessible with common wearable devices. However, methods are lacking to recommend a personalized…
We derive the asymptotic distribution of the spatial Cram'{e}r--von Mises statistic for testing bivariate independence in stationary random fields on $\mathbb{R}^2$ under polynomial $\beta$-mixing dependence, and document the Python…
Clinical prediction models provide a prediction (e.g., estimated risk) for each individual, typically expressed as a point estimate derived from a deterministic function such as a logistic regression equation. Such 'plug-in' predictions…