统计方法学
Optimizing complex manufacturing processes often involves a trade-off between data accuracy and acquisition cost. High-fidelity data are accurate but limited, while low-fidelity data are abundant but often biased. Balancing these two…
Transporting findings from a study population to a target population is central to evidence-based decision-making in real-world settings. Most existing methods require individual-level data from both populations to account for covariate…
Estimating causal effects is particularly challenging when outcomes arise in complex, non-Euclidean spaces, where conventional methods often fail to capture meaningful structural variation. We develop a framework for topological causal…
The overlapping coefficient is a fundamental measure of similarity between probability distributions. While the case of two distributions has been extensively studied, extending this measure to multiple populations presents both analytical…
This paper develops a novel change point identification method for high-dimensional data using random projections. By projecting high-dimensional time series into a one-dimensional space, we are able to leverage the rich literature for…
Tukey's boxplot is widely used for outlier detection; however, its classic fixed-fence rule tends to flag an excessive number of outliers as the sample size grows. To address this, we introduce two new R packages, ChauBoxplot and…
In this tutorial, we provide a hands-on guideline on how to implement complex Dynamic Latent Class Structural Equation Models (DLCSEM) in the Bayesian software JAGS. We provide building blocks starting with simple Confirmatory Factor and…
Probability distributions defined on the unit interval are widely used in fields ranging from econometrics to reliability studies. Traditional models such as the beta and Kumaraswamy distributions are well-established due to their…
Standard methods for determining the number of factors often overestimate the true number when data exhibit heavy-tailed randomness, misinterpreting noise-induced outliers as genuine factors. This paper addresses this challenge within the…
We study a networked system of innovation processes, where each process is modeled as an urn with infinitely many colors-a classical framework for capturing the emergence of novelties. Extending this paradigm, we analyze a model of…
The occurrence of atypical circular observations on the torus can badly affect parameter estimation of the multivariate von Mises distribution. This paper addresses the problem of robust fitting of the multivariate von Mises model using the…
I present all the details in calculating the posterior distribution of the conjugate Normal-Gamma prior in Bayesian Linear Models (BLM), including correlated observations, prediction, model selection and comments on efficient numeric…
Assessing fit in common factor models solely through the lens of mean and covariance structures, as is commonly done with conventional goodness-of-fit (GOF) assessments, may overlook critical aspects of misfit, potentially leading to…
The disaggregated time-series for the Consumer Price Index (CPI) often exhibits exact zero price changes, stemming from structural features of the data collection process. However, the currently prominent stochastic volatility model of…
Stochastic gradient descent (SGD) is a scalable and memory-efficient optimization algorithm for large datasets and stream data, which has drawn a great deal of attention and popularity. The applications of SGD-based estimators to…
Controlling the false discovery rate (FDR) in variable selection becomes challenging when predictors are correlated, as existing methods often exclude all members of correlated groups and consequently perform poorly for prediction. We…
Stepped-wedge cluster randomised trials (SW-CRTs) increasingly evaluate complex interventions, yet methodological guidance for analysing composite endpoints using generalized pairwise comparisons (GPC)remains limited. This work investigates…
Ordinal measurements are common outcomes in studies within psychology, as well as in the social and behavioral sciences. Choosing an appropriate regression model for analysing such data poses a difficult task. This paper aims to facilitate…
We have developed and tested a spatial scan statistic for categorical, functional data (CFSS) - a data structure within which current approaches cannot identify spatial clusters. Our methodology combines an encoding scheme for categorical,…
Bayesian sample size calculations in clinical trials usually rely on complex Monte Carlo simulations in practice. Obtaining bounds on Bayesian notions of the false-positive rate and power often lack closed-form or approximate numerical…