统计方法学
K-means clustering, a classic and widely-used clustering technique, is known to exhibit suboptimal performance when applied to non-linearly separable data. Numerous adjustments and modifications have been proposed to address this issue,…
Discrete Choice Experiments (DCEs) investigate participants' preferences by observing their choice behavior in hypothetical scenarios and are widely used in the domain of healthcare. To reduce participants' cognitive burden, especially when…
Sequential estimators are proposed for the relative risk, odds ratio, log relative risk or log odds ratio of a dichotomous attribute in two populations. The estimators take the same number of observations from each population, and guarantee…
Dirichlet distributions are probability measures on the unit simplex. They are often used as prior distributions in modeling categorical data, such as in topic analysis of text data. Motivated by this application, we consider Monte Carlo…
In dependently censored survival data, the usual assumption of independent censoring or an incorrect specification of the correlation between the event and censoring times can bias marginal survival inference. Likelihood-based estimation of…
Cardiovascular diseases are major causes of mortality globally. They often co-occur and are interrelated, leading to partial-order relationships among their onset times. However, these onset times are subject to informative censoring due to…
In many modern applications, a carefully designed primary study provides individual-level data for interpretable modeling, while summary-level external information is available through black-box, efficient, and nonparametric…
The estimand framework provides guidance on handling intercurrent events, such as treatment discontinuation, in the analysis of clinical trial responses. Under ICH E9(R1), the treatment policy (TP) strategy incorporates post-discontinuation…
This work reconciles two perspectives on the Elo ranking that coexist in the literature: the practitioner's view as a heuristic feedback rule, and the statistician's view as online maximum likelihood estimation via stochastic gradient…
Accounting for both rare events and complex sampling presents challenges when quantifying uncertainty for rate estimation in autonomous vehicle performance evaluation. In this paper, we introduce a statistical formulation of this problem…
Testing for normality is a widely used procedure in statistics and data analysis, often applied prior to employing methods that rely on the assumption of normally distributed data. While several existing tests target distributional…
Spherically embedded time series are time series with values naturally residing on or can be equivalently mapped to the sphere. Despite their ubiquity in diverse scientific fields, these data frequently exhibit complex non-stationarity…
Regression discontinuity (RD) analysis with latent variables as introduced by Morell et al. (2025), offers a useful augmentation of the conventional RD by incorporating measurement model. This approach is particularly relevant in education…
Quantitative research in the social and behavioral sciences relies heavily on nonlinear posterior functionals such as indirect effects, standardized coefficients, effect sizes, intraclass correlations, and multilevel variance-explained…
Neuron-level firing data is believed to be governed by latent activation patterns during task completion. Analysing repeated trials of a task allows us to study these patterns, typically by averaging in-vivo neural spikes across trials.…
Statistical inference has undergone a profound transformation over the past decade, evolving from a significance-testing paradigm toward a comprehensive, transparency-driven framework embedded within the broader open science ecosystem.…
Weibull distribution is widely used in modelling health data. However, its lack of sufficient tail flexibility often results in poor fit in extreme events. We proposed another three-parameter extension of the Weibull distribution with…
Underpowered studies (below 50% power) suffer from the winner's curse: A statistically significant positive estimate must exaggerate the true treatment effect to meet the significance threshold. A study by Dipayan Biswas, Annika Abell, and…
The notion of causal effect is fundamental across many scientific disciplines. Traditionally, quantitative researchers have studied causal effects at the level of variables; for example, how a certain drug dose (W) causally affects a…
We propose a two-sample mean test based on the Bayes factor with non-informative priors, specifically designed for scenarios where the dimension $p$ grows with the sample size $n$ with a linear rate $p/n \to c_1 \in (0, \infty)$. We…