统计方法学
We propose sequential transport (ST), a distributional framework for mediation analysis that combines optimal transport (OT) with a mediator directed acyclic graph (DAG). Instead of relying on cross-world counterfactual assumptions, ST…
This dissertation presents a general framework for changepoint detection based on L0 model selection. The core method, Iteratively Reweighted Fused Lasso (IRFL), improves upon the generalized lasso by adaptively reweighting penalties to…
Comparative judgement studies elicit quality assessments through pairwise comparisons, typically analysed using the Bradley-Terry model. A challenge in these studies is experimental design, specifically, determining the optimal pairs to…
Estimators in statistics and machine learning must typically trade off between efficiency, having low variance for a fixed target, and distributional robustness, such as multiaccuracy, or having low bias over a range of possible targets. In…
Mobile health (mHealth) leverages digital technologies, such as mobile phones, to capture objective, frequent, and real-world digital phenotypes from individuals, enabling the delivery of tailored interventions to accommodate substantial…
Monitoring for changes in a predictive relationship represented by a fitted supervised learning model (i.e., concept drift detection) is a widespread problem in modern data-driven applications. A general and powerful Fisher score-based…
Gaussian processes are a powerful class of non-linear models, but have limited applicability for larger datasets due to their high computational complexity. In such cases, approximate methods are required, for example, the recently…
Causal inference literature has extensively focused on binary treatments, with relatively fewer methods developed for multi-valued treatments. In particular, methods for multiple simultaneously assigned treatments remain understudied…
We study the problem of linear feature selection when features are highly correlated. Such settings pose two fundamental challenges. First, how should model similarity be defined? Simply counting features in common can be misleading: two…
Spontaneous reporting system databases are key resources for post-marketing surveillance, providing real-world evidence (RWE) on the adverse events (AEs) of regulated drugs or other medical products. Various statistical methods have been…
Over the past decades, the increasing dimensionality of data has increased the need for effective data decomposition methods. Existing approaches, however, often rely on linear models or lack sufficient interpretability or flexibility. To…
Quantifying causal effects in the presence of complex and multivariate outcomes remains a key challenge in treatment evaluation. For hierarchical multivariate outcomes, the FDA recommends the Win Ratio and Generalized Pairwise Comparisons…
We derive explicit formulas for Kendall's tau and Spearman's rho for two broad classes of asymmetric copulas: normal location-scale mixture copulas and skew-normal scale mixture copulas. These classes encompass widely used specifications,…
Regression models have a substantial impact on interpretation of treatments, genetic characteristics and other potential risk factors in survival analysis. In many applications, the description of censoring and survival curve reveals the…
The question that drives this research is: "How to discover the number of respondents that are necessary to validate items of a questionnaire as actually essential to reach the questionnaire's proposal?" Among the efforts in this subject,…
Online controlled experiments (A/B testing) are fundamental to data-driven decision-making in many companies. Improving the sensitivity of these experiments under fixed sample size constraints requires reducing the variance of the average…
The spatial linear mixed model (SLMM) consists of fixed and spatial random effects that may be linearly dependent. Partially motivated as a means to address potential issues with confounding, the Restricted spatial regression (RSR) model…
Functional principal component analysis (FPCA) is a key tool in the study of functional data, driving both exploratory analyses and feature construction for use in formal modeling and testing procedures. However, existing methods for FPCA…
Copula-based time series models can model univariate and stationary time series in a flexible way by decomposing the joint distribution of consecutive observations into a copula and the stationary distribution. Implicitly this approach…
Untargeted metabolomics based on liquid chromatography-mass spectrometry technology is quickly gaining widespread application given its ability to depict the global metabolic pattern in biological samples. However, the data is noisy and…