统计方法学
Existing mortality forecasting methods focus on age-specific mortality rates, which lie in an unconstrained space and overlook the distributional nature of life-table death counts. Few studies have developed and compared forecasting methods…
Pre-smoothing is a technique aimed at increasing the signal-to-noise ratio in data to improve subsequent estimation and model selection in regression problems. However, pre-smoothing has thus far been limited to the univariate response…
There is increasing interest in combining information from experimental studies, including randomized and single-group trials, with information from external experimental or observational data sources. Such efforts are usually motivated by…
This work introduces a new method for selecting the number of components in finite mixture models (FMMs) using variational Bayes, inspired by the large-sample properties of the Evidence Lower Bound (ELBO) derived from mean-field (MF)…
Bayesian inference with empirical likelihood faces a challenge as the posterior domain is a proper subset of the original parameter space due to the convex hull constraint. We propose a regularized exponentially tilted empirical likelihood…
Mixed-effects logistic regression is widely used for binary outcomes in hierarchical data, yet formal goodness-of-fit tests remain limited to random-intercept models and do not address sparse cluster settings. We extend a grouping-based…
In the FDR-controlling literature, mirror statistics offer a flexible alternative to $p$-value based procedures. When prior information is available, however, it is unclear how to incorporate mirror statistics in a principled way, and the…
The multivariate generalised Gaussian distribution (MGGD) is commonly used to model high-dimensional vectors with non-Gaussian radial behaviour, ranging from sharp-peaked to heavy-tailed profiles. However, because many classical…
This paper proposes an extension to discrete Phase-Type distributions (DPH) by introducing random rewards. These allow for modeling a system in which a visit to a certain state does not emit a deterministic reward. Instead, the rewards…
We propose a nonparametric approach to testing conditional independence and estimating conditional association, generalizing the Cochran-Mantel-Haenszel (CMH) test and odds-ratio estimator to continuous sample spaces. It leverages a…
Statistical analysis of network data has attracted considerable attention in recent years, due to the rapid advancement of well-trained network models and the accessibility of large public network datasets. In this article, we propose a…
Loss-based priors assign probability mass to parameter values according to the inferential loss incurred when they are excluded from the parameter space, and provide a general solution for discrete parameters. Extending this idea to…
Penalized generalized estimating equations (PGEE) stabilize point estimation for longitudinal binary data under near-separation, but inference still depends on how the sandwich variance is corrected. Existing corrections for PGEE can…
This paper develops a nonparametric density estimator with parametric overtones. Suppose $f(x,\theta)$ is some family of densities, indexed by a vector of parameters $\theta$. We define a local kernel smoothed likelihood function which for…
Analyzing correlation between variables is often both the tool and the goal of modern science. A crucial question is whether the correlation between two variables is a direct correlation or only an indirect correlation through a confounder.…
Random-effects meta-analysis summarizes heterogeneous trials by estimating an average effect over the observed evidence base, which may not represent the clinically relevant target population. In cardiovascular medicine, treatment effects…
Classical latent-score ranking models often fail to distinguish objects' intrinsic scores from contextual effects, which are typically nonlinear and can dominate the observed outcomes. To address this, we introduce a semiparametric ranking…
How should researchers conduct causal inference when the outcome of interest is latent and measured imperfectly by multiple indicators? We develop a general nonparametric framework for identifying and estimating average treatment effects on…
In many classification problems, misclassification costs are highly asymmetric, while training labels are often corrupted due to measurement error, annotator variability, or adversarial noise. The Neyman-Pearson multiclass classification…
Accurately predicting hospital readmission risks using electronic health records (EHRs) is critical for effective patient management and healthcare resource allocation. Patient populations in health systems are highly heterogeneous across…