统计方法学
Non-negative matrix factorization (NMF) is widely used for parts-based representations, yet formal inference for covariate effects is rarely available when the basis is learned under non-negativity. We introduce non-negative matrix…
Kernel density estimators with circular data have been studied extensively for decades, as they allow flexible estimations even when the shape of the underlying density is complex. Many recent studies have examined bias correction methods;…
Integrated IPD-AD analysis, which combines individual participant data (IPD) with aggregate data (AD), is increasingly recognized as an effective strategy for generating more reliable and generalizable inferences from heterogeneous studies.…
Circular variables that represent directions or periodic observations arise in many fields, such as biology and environmental sciences. An important issue when dealing with circular data is how to estimate their dispersion robustly,…
A fundamental challenge in causal inference with observational data is correct specification of a causal model. When there is model uncertainty, analysts may seek to use estimates from multiple candidate models that rely on distinct, and…
Net survival is conventionally defined as ``survival if cancer were the only possible cause of death'', an estimand corresponding to cancer-specific mortality alone. The Pohar Perme estimator targets this by removing general population…
High-dimensional feature selection is routinely required to balance statistical power with strict control of multiple-error metrics such as the k-Family-Wise Error Rate (k-FWER) and the False Discovery Proportion (FDP), yet some existing…
Envelope models provide a sufficient dimension reduction framework for multivariate regression analysis. Bayesian inference for these models has been developed primarily using Markov chain Monte Carlo (MCMC) methods. Specifically, Gibbs…
Comparing multivariate yield quality distributions across spatially referenced agricultural fields is complicated by two pervasive features: non-normality and spatial autocorrelation. Classical procedures such as ANOVA, MANOVA, and standard…
Network meta-analysis (NMA) is widely used to compare multiple interventions simultaneously by synthesizing direct and indirect evidence. The general fixed or random effects contrast-based NMA model can be applied to different outcomes and…
Accurate power and sample size (PSS) calculations are essential for designing studies that use quasi-likelihood (QL) models, which extend generalized linear models (GLMs) to settings where the full distribution of the outcome is not…
There is recent interest in estimating the false discovery rate (FDR) with published p-values. However, there is little formal research that addresses the manner and extent to which the presumed selection, or publication, bias model impacts…
Bayesian inference in generalized linear models requires a prior on the coefficient vector $\beta$. Practitioners naturally reason about response probabilities at specific covariate values, not about abstract log-odds parameters. We develop…
In observational studies, causal inference becomes difficult when confounders are missing-not-at-random (MNAR), particularly where the missingness depends on the confounder's own unreported value (self-masking). Existing methods for…
With the increasing availability of data objects in the form of probability distributions, there is a growing need for statistical methods tailored to distributional data. Distance measures, especially the pairwise distance matrix between…
We propose the CliPS procedure when fitting Bayesian mixture models in the context of model-based clustering to identify the cluster distributions while simultaneously assessing the suitability of a cluster solution and validating the…
We are honoured to have our work read and discussed at such a thorough level by several experts. Words of appreciation and encouragement are gratefully received, while the many supplementary comments, thoughtful reminders, new perspectives…
We develop an improvement to conditional logistic regression (CLR) in the setting where the parameter of interest is the additive effect of binary treatment effect on log-odds of the positive level in the binary response. Our improvement is…
The Hoover index is a widely used measure of inequality with an intuitive interpretation, yet little is known about the finite-sample properties of its empirical estimator. In this paper, we derive a simple expression for the expected value…
Generalized linear models (GLMs) are fundamental tools for statistical modeling, with maximum likelihood estimation (MLE) serving as the classical approach for parameter inference. While MLE performs well for canonical GLMs, it can become…