统计方法学
We study the estimation of leverage effect and volatility of volatility by using high-frequency data with the presence of jumps. We first construct spot volatility estimator by using the empirical characteristic function of the…
Understanding how neurons coordinate their activity is a fundamental question in neuroscience, with implications for learning, memory, and neurological disorders. Calcium imaging has emerged as a powerful method to observe large-scale…
Heterogeneous, mixed type datasets including both continuous and categorical variables are ubiquitous, and enriches data analysis by allowing for more complex relationships and interactions to be modelled. Mixture models offer a flexible…
Many computer simulations are stochastic and exhibit input dependent noise. In such situations, heteroskedastic Gaussian processes (hetGPs) make ideal surrogates as they estimate a latent, non-constant variance. However, existing hetGP…
For many health conditions, there are highly efficacious treatment and prevention products. Maximizing their impact requires strategies that improve the reach of health screening in order to establish who could benefit. For example, HIV…
Compositional regression models with a real-valued response variable can generally be specified as log-contrast models subject to a zero-sum constraint on the model coefficients. This formulation emphasises the relative information conveyed…
A fundamental problem in statistics is measuring the correlation between two rankings of a set of items. Kendall's $\tau$ and Spearman's $\rho$ are well established correlation coefficients whose symmetric structure guarantees zero expected…
Sparse covariance matrices play crucial roles by encoding the interdependencies between variables in numerous fields such as genetics and neuroscience. Despite substantial studies on sparse covariance matrices, existing methods face several…
Treatment effect heterogeneity with respect to covariates is common in instrumental variable (IV) analyses. An intuitive approach, which we call the interacted two-stage least squares (2sls), is to postulate a working linear model of the…
Comparing yield quality distributions across multiple agricultural fields is fundamental for evaluating management practices, yet it is complicated by two pervasive data characteristics: non-normality and spatial autocorrelation.…
Social network interference induces complex dependencies where a unit's outcome is influenced not only by its own exposure and mediator but also by those of connected neighbors. In such settings, a significant challenge lies in…
Model selection in the presence of intractable likelihoods remains a central challenge in Bayesian inference. Approximate Bayesian computation (ABC) provides a flexible likelihood-free framework, but its use for model choice is known to be…
Composite endpoints consisting of both terminal and non-terminal events, such as death and hospitalization, are frequently used in cardiovascular clinical trials. The Finkelstein-Schoenfeld (FS) test provides a way to employ a hierarchical…
The Jacobi prior offers an alternative Bayesian framework, designed to achieve superior computational efficiency without compromising predictive performance. Compared to widely used methods such as Lasso, Ridge, Elastic Net, uniLasso, the…
In numerous instances, the generalized exponential distribution can be used as an alternative to the most widely used non-regular family of distributions: Weibull, gamma, lognormal with three-parameters when analyzing lifetime or any skewed…
Quantifying the heterogeneity of treatment effect is important for understanding how a commercial product or medical treatment affects different population subgroups. While much of treatment effect heterogeneity analysis focuses on the…
Within the framework of smoothing spline ANOVA, we propose a plug-in kernel ridge regression estimator to estimate the derivatives of the underlying multivariate regression function. We first establish an $L_\infty$ convergence rate of the…
Benkeser et al. demonstrate how adjustment for baseline covariates in randomized trials can meaningfully improve precision for a variety of outcome types. Their findings build on a long history, starting in 1932 with R.A. Fisher and…
Deconvolution is the important problem of estimating the distribution of a quantity of interest from a sample with additive measurement error. Nearly all methods in the literature are based on Fourier transformation because it is…
In a typical two-phase design, a random sample is drawn from the target population in phase 1, during which only a subset of variables is collected. In phase 2, a subsample of the phase-1 cohort is selected, and additional variables are…