统计方法学
Data represented by probability measures arise as empirical distributions, posterior distributions, and feature-based representations of complex objects. We study heterogeneity in a population of probability measures through the expected…
Motivated by parametric models for which the likelihood is analytically unavailable, numerically unstable, or prohibitively expensive to compute or optimize, we develop a prior- and likelihood-free framework for fully probabilistic…
Large language models (LLMs) have achieved remarkable performance on diverse benchmarks, yet existing evaluation practices largely rely on coarse summary metrics that obscure underlying reasoning abilities. In this work, we propose novel…
Due to the increase in data availability in urban and regional studies, various spatial panel models have emerged to model spatial panel data, which exhibit spatial patterns and spatial dependencies between observations across time.…
Label noise - incorrect labels assigned to observations - can substantially degrade the performance of supervised classifiers. This paper proposes a label noise cleaning method based on Bernoulli random sampling. We show that the mean label…
Surrogate markers are often employed in clinical trials to replace primary outcomes that may be difficult, expensive, or time-consuming to measure directly. These markers can accelerate the evaluation of new treatments, provided they…
Robust principal component analysis (RPCA) is a widely used technique for recovering low-rank structure from matrices with missing entries and sparse, possibly large-magnitude corruptions. Although numerous algorithms achieve accurate point…
We study global inference for regression coefficients in high-dimensional linear models under potentially heavy-tailed errors. While sum-type tests are powerful for dense alternatives and max-type tests excel for sparse alternatives,…
Average treatment effects (ATE) and conditional average treatment effects (CATE) are foundational causal estimands, but they target changes in expected outcomes and can miss treatment-induced changes in the shape of outcome distributions. A…
A semiparametric copula-based two-part quantile regression framework is developed for the analysis of semicontinuous outcomes characterized by a point mass at zero and a continuous positive component. The proposed approach models the…
Real-world learning tasks often encounter uncertainty due to covariate shift and noisy or inconsistent labels. However, existing robust learning methods merge these effects into a single distributional uncertainty set. In this work, we…
Model averaging, as an appealing ensemble technique, strategically integrates all valuable information from candidate models to construct fast and accurate prediction. Despite of having been widely practiced in many fields such as…
Understanding how an exposure transmits its effect through high-dimensional intermediaries is a central problem in observational research. We study the problem of finding a composite mediator that maximises the indirect effect of an…
Conditional independence is a fundamental concept in many areas of statistical research, including, for example, sufficient dimension reduction, causal inference, and statistical graphical models. In many modern applications, data arise in…
Kernel methods are widely used in causal inference for tasks such as treatment effect estimation, policy evaluation, and policy learning. The bootstrap is a standard tool for uncertainty quantification because of its broad applicability. As…
Surrogate models - also called emulators - are widely used to facilitate Bayesian inference in settings where computational costs preclude the use of standard posterior inference algorithms. Their deployment is now standard practice across…
Simultaneous occurrences of extreme events need not imply symmetric or reciprocal tail dependence. However, most existing measures of extremal dependence are inherently symmetric and hence often fail to capture directional influence in tail…
We consider clinical trials in which an experimental treatment is compared with a control in pre-specified patient subpopulations. In such settings, adaptive enrichment designs allow the enrolled population to be modified at an interim…
While variable selection has received extensive attention in the literature, its exploration in the presence of response measurement error remains underexplored. In this paper, we investigate this important problem within the context of…
We investigate robust parameter estimation and testing procedure for multivariate diffusion processes observed at high frequency via the minimum density power divergence estimator (MDPDE). Within a general diffusion framework and under…