统计方法学
Causal inference in modern largescale systems faces growing challenges, including highdimensional covariates, multi-valued treatments, massive observational (OBS) data, and limited randomized controlled trial (RCT) samples due to cost…
Learning low-dimensional latent representations is a central topic in statistics and machine learning, and rotation methods have long been used to obtain sparse and interpretable representations. Despite nearly a century of widespread use…
Time-varying covariates in longitudinal studies frequently evolve through reciprocal feedback, undergo role reversal, and reflect unobserved individual heterogeneity. Standard statistical frameworks often assume fixed covariate roles and…
A conventional Bayesian approach to prediction uses the posterior distribution to integrate out parameters in a density for unobserved data conditional on the observed data and parameters. When the true posterior is intractable, it is…
Generalized causal effect estimands, including the Mann-Whitney parameter and causal net benefit, provide flexible summaries of treatment effects in randomized experiments with non-Gaussian or multivariate outcomes. We develop a unified…
Causal inference is only valid when its underlying assumptions are satisfied, one of the most central being the ignorability or unconfoundedness assumption. However, this hypothesis is often unrealistic in observational studies, as some…
This manuscript provides step-by-step instructions for implementing Bayesian functional regression models using Stan. Extensive simulations indicate that the inferential performance of the methods is comparable to that of state-of-the-art…
This study proposes a novel functional vector autoregressive framework for analyzing network interactions of functional outcomes in panel data settings. In this framework, an individual's outcome function is influenced by the outcomes of…
In online multiple testing, the hypotheses arrive one by one, and at each time we must immediately reject or accept the current hypothesis solely based on the data and hypotheses observed so far. Many online procedures have been proposed,…
Control variates are variance reduction techniques for Monte Carlo estimators. They play a critical role in improving Monte Carlo estimators in scientific and machine learning applications that involve computationally expensive integrals.…
This paper develops a class of Bayesian non- and semiparametric methods for estimating regression curves and surfaces. The main idea is to model the regression as locally linear, and then place suitable local priors on the local parameters.…
A class of causal effect functionals requires integration over conditional densities of continuous variables, as in mediation effects and nonparametric identification in causal graphical models. Estimating such densities and evaluating the…
When learning interpretable latent structures using model-based approaches, even small deviations from modeling assumptions can lead to inferential results that are not mechanistically meaningful. In this work, we consider latent structures…
Estimating heterogeneous treatment effects is central to data-driven decision-making, yet industrial applications often face a fundamental tension between limited randomized controlled trial (RCT) budgets and abundant but biased…
Adaptive designs dynamically update treatment probabilities using information accumulated during the experiment. Existing theory for causal inference from adaptive experiments primarily assumes the superpopulation framework with independent…
Estimating the number of the number of people from hidden and/or marginalised populations - such as people dependent on opioids or cocaine - is important to guide policy decisions and provision of harm reduction services. Methods such as…
Longitudinal data often involve heterogeneity, sparse signals, and contamination from response outliers or high-leverage observations especially in biomedical science. Existing methods usually address only part of this problem, either…
We consider regression models with data of the type $y_i=m(x_i)+\varepsilon_i$, where the $m(x)$ curve is taken locally constant, with unknown levels and jump points. We investigate the large-sample properties of the minimum least squares…
We examine the optimality properties of the Gini index estimator under complex survey design involving stratification, clustering, and sub-stratification. While Darku et al. (Econometrics, 26, 2020) considered only stratification and…
Understanding the structural mechanisms of multi-layer networks is essential for analyzing complex systems characterized by multiple interacting layers. This work studies the problem of estimating connection probabilities in multi-layer…