统计方法学
Time-series stationarity is a property that statistical characteristics such as trend, variance, seasonality remain constant over time. It is considered fundamental to many forecasting and analysis methods. Different tests detect different…
While clustering is ubiquitously used across science and industry, uncertainty in cluster assignments is rarely quantified with rigorous guarantees. We propose a novel conformal inference framework for clustering that returns confidence…
Attrition in survey and field experiments presents a challenge for social science research. Common approaches to deal with this problem -- such as complete case analysis, multiple imputation, and weighting methods -- rely on strong…
Governments routinely adjust capacity in rationed programs such as university fields, medical training and public housing, where admitting one individual displaces others and triggers chains of reallocation. We show that in such settings,…
Watermarking for large language models (LLMs) has emerged as an effective tool for distinguishing AI-generated text from human-written content. Statistically, watermark schemes induce dependence between generated tokens and a pseudo-random…
Nonlinear stochastic motion presents significant challenges for Bayesian particle tracking. To address this challenge, this paper proposes a framework to construct an invertible transformation that maps the nonlinear state-space model (SSM)…
In many decision-making problems, the primary outcome is expensive, time-consuming, or difficult to observe, so individualized treatment rules (ITRs) may be instead learned from surrogate endpoints. However, a surrogate that is highly…
Inference in extreme value theory relies on a limited number of extreme observations, making estimation challenging. To address this limitation, we propose a non-parametric simulation scheme, the multivariate extreme events spectral…
We present a novel Bayesian spatial disaggregation model for count data, providing fast and flexible inference at high resolution. First, it incorporates non-linear covariate effects using penalized splines, a flexible approach that is not…
Univariate marked Hawkes processes are used to model a range of real-world phenomena including earthquake aftershock sequences, contagious disease spread, content diffusion on social media platforms, and order book dynamics. This paper…
This paper proposes new methodologies for conducting practical differentially private (DP) estimation and inference in high-dimensional linear regression. We first introduce a DP Bayesian Information Criterion (DP-BIC) for selecting the…
Matrix recovery from sparse observations is an extensively studied topic emerging in various applications, such as recommendation system and signal processing, which includes the matrix completion and compressed sensing models as special…
Recent advances in single-cell technologies have advanced our understanding of gene regulation and cellular heterogeneity at single-cell resolution. Single-cell data contain both gene expression levels and the proportion of expressing…
We propose a flexible Bayesian approach for estimating the joint density of a multivariate outcome of interest in the presence of categorical covariates. Leveraging a Gaussian copula framework, our method effectively captures the dependence…
When designing and evaluating an experiment or observational study, it is useful to have a realistic hypothesis regarding the average treatment effect. We present an approach to conceptualizing this average by first considering a…
\textbf{Background:} Mediation analysis is widely used to investigate how treatments and programs exert their effects, but standard ordinary least squares (OLS) inference can be unreliable when regression errors are non-Gaussian. In medical…
Background: Composite endpoints in cardiovascular trials combine heterogeneous outcomes-mortality, nonfatal events, hospitalizations, and biomarkers-yet conventional analytical methods sacrifice information by targeting a single dimension.…
We introduce a general semiparametric clusterwise elliptical distribution to assess how latent cluster structure shapes continuous outcomes. Using a subjectwise representation, we first estimate cluster-specific mean vectors and a…
We propose a novel tensor-on-tensor modeling framework that flexibly models nonlinear voxel-level relationships using Gaussian process (GP) priors, while incorporating the spatial structure of the output tensor through low-rank tensor-based…
This article investigates the model-robustness of fixed-effects models for analyzing a broad class of longitudinal cluster trials (CTs) such as stepped-wedge, parallel-with-baseline and crossover designs, encompassing both randomized (CRTs)…