Statistics
Missing data is pervasive in many scientific domains such as public health, environmental science, and the social sciences. Recoverability from missing data is typically studied using fully specified variable-level missingness models…
The intersection set of Bayesian and nonparametric statistics was almost empty until about 1973, but now is growing at a healthy rate. This chapter, for the {\it Highly Structured Stochastic Systems} book (Oxford University Press, 2003)…
This paper proposes a novel, nonparametric, interpoint distance-based measure to investigate whether there exist any groups in a set of given data, and if so then, how many groups are prevailing in total. It is a cluster accuracy index…
Modern applications of conformal inference to multiple testing problems, such as outlier detection and candidate selection, often involve selecting test samples whose conformal p-values fall below a threshold. The quality of such methods is…
Conditional average treatment effects (CATEs) are increasingly estimated from observational data and used to guide policy and individualized treatment decisions. Before such estimates can be trusted in practice, their predictive fitness…
During an infectious disease outbreak, providing accurate answers to policy questions about transmission requires a detailed model of the natural history of infectiousness. Unfortunately, direct measures of infectiousness are generally…
Distributed principal component analysis (PCA) produces node-level estimates of both a mean vector and a principal subspace. Robustly aggregating these heterogeneous objects requires a relative scale between mean error and subspace error.…
We develop joint confidence regions for linear regression coefficients when the regressors and errors are jointly stationary and ergodic with unspecified serial dependence. The method applies random smoothing, using an independent auxiliary…
Propensity score (PS) methods are widely used in observational studies to reduce confounding and estimate causal treatment effects. However, the validity of PS-based causal estimators depends heavily on correct model specification, and…
Changepoints are essential for homogenizing categorical time series and analyzing their trends and variations. The original total cloud cover in Canada was recorded hourly in tenths (or eighths), exhibiting inherent seasonality and serial…
Causal mediation analysis is essential for disentangling the mechanisms by which investigational therapeutic and preventive agents impact clinical outcomes. However, the measurement of biological mediators is often subject to left-censoring…
Many functional datasets are observed sparsely and irregularly. Ordering such data is challenging because only limited information is available from each observation, while the underlying trajectories remain infinite-dimensional. This paper…
Background: Often when undertaking meta-analyses of time-to-event (TTE) outcomes, especially in a Health Technology Assessment context, a hazard ratio (HR) scale is used. However, issues arise when there is evidence of non-proportional…
Modern matrix completion problems often involve heterogeneous data whose rows simultaneously belong to many meta-categories, such as demographic and age groups in recommendation systems, or region and recording session labels in neural…
Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this paper, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning…
Transfer learning is an essential technique for many machine learning/AI models of complex structures such as large language models and generative AI. The essence of transfer learning is to leverage knowledge from resolved source tasks for…
The problem of detecting new signals in the presence of an unknown background is ubiquitous in scientific discoveries and is especially prominent in the physical sciences. Most solutions proposed thus far to address the problem focus on…
We study the contradiction graphs associated with binary concept classes. For a class $H \subseteq \{0,1\}^X$, the order-$m$ contradiction graph $G_m(H)$ has as vertices the $H$-realizable labeled sequences of length $m$, with two vertices…
Accurately detecting home locations from GPS data generated by mobile devices is a foundational step in human mobility research, with significant implications for transportation planning, public health, and emergency response. However,…
Infrastructure deterioration poses significant challenges for asset management, yet existing approaches rely on population-averaged models that overlook equipment-specific heterogeneity. We present a novel framework that combines Bayesian…