统计方法学
To conduct causal inference in observational settings, researchers must rely on certain identifying assumptions. In practice, these assumptions are unlikely to hold exactly. This paper considers the bias of selection-on-observables,…
High-dimensional data analysis using traditional models suffers from overparameterization. Two types of techniques are commonly used to reduce the number of parameters - regularization and dimension reduction. In this project, we combine…
Among inferential problems in functional data analysis, domain selection is one of the practical interests aiming to identify sub-interval(s) of the domain where desired functional features are displayed. Motivated by applications in…
High-throughput pheno-, geno-, and envirotyping allows characterization of plant genotypes and the trials they are evaluated in, producing different types of data. These different data modalities can be integrated into statistical or…
Quantile regression is a powerful tool capable of offering a richer view of the data as compared to least-squares regression. Quantile regression is typically performed individually on a few quantiles or a grid of quantiles without…
The quantile spectrum was introduced in Li (2012; 2014) as an alternative tool for spectral analysis of time series. It has the capability of providing a richer view of time series data than that offered by the ordinary spectrum especially…
The quantile-crossing spectrum is the spectrum of quantile-crossing processes created from a time series by the indicator function that shows whether or not the time series lies above or below a given quantile at a given time. This…
Multivariate Gaussian distributions enjoy Gaussian conditional distributions that makes conditioning easy: conditioning boils down to implementing analytical formulae for conditional means and covariances. For more general distributions,…
Principal stratification provides a causal inference framework for investigating treatment effects in the presence of a post-treatment variable. Principal strata play a key role in characterizing the treatment effect by identifying groups…
Importance sampling (IS) is an efficient stand-in for model refitting in performing (LOO) cross-validation (CV) on a Bayesian model. IS inverts the Bayesian update for a single observation by reweighting posterior samples. The so-called…
There has been a misconception that only one type of error rate control is necessary in clinical trials, leading to debates over whether to prioritize Familywise Error Rate (FWER) or False Discovery Rate (FDR). This misconception has led to…
A nonparametric method is proposed for estimating the quantile spectra and cross-spectra introduced in Li (2012; 2014) as bivariate functions of frequency and quantile level. The method is based on the quantile discrete Fourier transform…
The simultaneous estimation of many parameters based on data collected from corresponding studies is a key research problem that has received renewed attention in the high-dimensional setting. Many practical situations involve heterogeneous…
Extreme quantile treatment effects (eQTEs) measure the causal impact of a treatment on the tails of an outcome distribution and are central for studying rare, high-impact events. Standard QTE methods often fail in extreme regimes due to…
We develop an identifiable reduced-rank spatial multinomial model for categorical data with many classes. The model represents class-specific spatial effects through a low-dimensional set of shared latent factors, substantially reducing…
This paper proposes tds mgtwr, a multiscale geographically and temporally weighted regression (MGTWR) model with covariate-specific spatial and temporal scales. The approach combines a separable spatio-temporal kernel with a Top-Down Scale…
Single-index models or time-to-event models are frequently applied in empirical research. These models are non-identifiable in presence of unknown (dependent) censoring or competing risks and do not give informative results in empirical…
Optimizing survival outcomes, such as patient survival or customer retention, is a critical objective in data-driven decision-making. Off-Policy Evaluation~(OPE) provides a powerful framework for assessing such decision-making policies…
We study layer-specific community detection in an $L$-layer network $\{A^{(l)}\}_{l\in[L]}$ on a common set of $n$ nodes. Because modern networks are constructed from multi-modal data or with different contexts, the community labels…
Integrative analysis of multivariate functional time series (MFTS) is both critical and challenging across many scientific domains. Such data often exhibit complex multi-way dependencies arising from within-curve structures, temporal…