统计方法学
We study regression problems with distribution-valued responses and mixed distributional and Euclidean predictors. In quadratic cost, the negative gradient of the Kantorovich potential represents, at each source location, the displacement…
Linear regression estimators are known to be sensitive to outliers, and one alternative to obtain a robust and efficient estimator of the regression parameter is to model the error with Student's $t$ distribution. In this article, we…
In experimental and observational data settings, researchers often have limited knowledge of the reasons for missing outcomes. To address this uncertainty, we propose bounds on causal effects for missing outcomes, accommodating the scenario…
Simultaneous variable selection and statistical inference is challenging in high-dimensional data analysis. Most existing post-selection inference methods require explicitly specified regression models, which are often linear, as well as…
Principal component analysis has been a main tool in multivariate analysis for estimating a low dimensional linear subspace that explains most of the variability in the data. However, in high-dimensional regimes, naive estimates of the…
Interrupted time series (ITS) is often used to evaluate the effectiveness of a health policy intervention that accounts for the temporal dependence of outcomes. When the outcome of interest is a percentage or percentile, the data can be…
Clinical and epidemiological studies encode participant information in multivariate vectors with mixed type variables on continuous, truncated, ordinal, and binary scales. Semiparametric Gaussian Copula (SGC) assumes that observed data is…
Per- and polyfluoroalkyl substances (PFAS) are typically encountered as mixtures of distinct chemicals with distinct effects on multiple health outcomes. Estimating joint causal effects using spatially-dependent observed data is…
To analyze the uncertain data frequently encountered in practice, this paper proposes novel fixed-effects models that incorporate an uncertain measure to investigate variables of interest and nuisance variables in factor designs. First, an…
Over the last decade, nonparametric methods have gained increasing attention for modeling complex data structures due to their flexibility and minimal structural assumptions. In this paper, we study a general multivariate nonparametric…
Flexible distributions for modelling angular data have received considerable attention in recent years, with ongoing work extending existing circular models to provide greater flexibility in capturing diverse angular behaviours. In this…
Modern studies increasingly leverage outcomes predicted by machine learning and artificial intelligence (AI/ML) models, and recent work, such as prediction-powered inference (PPI), has developed valid downstream statistical inference…
In target trial emulation, time partitioning enables researchers to handle time-varying confounders and immortal time bias with appropriate methods. Based on two clinical scenarios, this study aimed to explore issues related to time…
This lecture note provides a self-contained introduction to Bayesian inference and Markov Chain Monte Carlo (MCMC) methods for parameter estimation in epidemic models. Using the classical Susceptible-Infectious-Recovered (SIR) compartmental…
The primary analysis for longitudinal randomized controlled trials (RCTs) often compares treatment groups at the last timepoint, referred to as the landmark time. Assuming data are normally distributed and missing at random, the mixed model…
Two-component mixture models are particularly useful for identifying differentially expressed genes, but their performance can deteriorate markedly when the alternative distribution departs from parametric assumptions or symmetry. We…
In two-phase multiwave sampling, inexpensive measurements are collected on a large sample and expensive, more informative measurements are adaptively obtained on subsets of units across multiple waves. Adaptively collecting the expensive…
Handling missing data in time series is a complex problem due to the presence of temporal dependence. General-purpose imputation methods, while widely used, often distort key statistical properties of the data, such as variance and…
Quantitative assessment of extinction risk requires confidence intervals (CIs) that remain informative with limited data. Their usefulness has long been debated because short observation spans can make uncertainty so large that population…
Robust Mixture Prior (RMP) is a popular Bayesian dynamic borrowing method, which combines an informative historical distribution with a less informative component (referred as robustification component) in a mixture prior to enhance the…