统计方法学
Regression is the workhorse of statistics, and is often faced with real data that contain outliers. When these are casewise outliers, that is, cases that are entirely wrong or belong to a different population, the issue can be remedied by…
Survey sampling is concerned with the estimation of finite population parameters. In practice, survey data suffer from item nonresponse, which is commonly handled through imputation, i.e., replacing missing values with predicted values. As…
With the advent of effective pre-exposure prophylaxis agents, active-controlled HIV prevention trials have become a common study design. Nevertheless, estimating absolute efficacy relative to a placebo remains important. In this paper, we…
Mixture models are widely used in modeling heterogeneous data populations. A standard approach of mixture modeling assumes that the mixture component takes a parametric kernel form. In many applications, making parametric assumptions on the…
Backtesting risk measures is a central task in financial regulation. While standard backtests evaluate whether a forecasting model is statistically consistent with observed losses, regulatory practice often requires assessing the…
Inverse problems are crucial for many applications in science, engineering and medicine that involve data assimilation, design, and imaging. Their solution infers the parameters or latent states of a complex system from noisy data and…
Treatment effect heterogeneity refers to the systematic variation in treatment effects across subgroups. There is an increasing need for clinical trials that aim to investigate treatment effect heterogeneity and estimate subgroup-specific…
To characterize the community structure in network data, researchers have developed various block-type models, including the stochastic block model, the degree-corrected stochastic block model, the mixed membership block model, the…
The progression of chronic diseases often follows highly variable trajectories, and the underlying factors remain poorly understood. Standard mixed-effects models typically represent inter-patient differences as random deviations around a…
This paper develops a comprehensive Markov-based framework for modelling reservoir behaviour and assessing key performance measures such as reliability and resilience. We first formulate a stochastic model for a finite-capacity dam,…
Producing reliable estimates of health and demographic indicators at fine areal scales is crucial for examining heterogeneity and supporting localized health policy. However, many surveys release outcomes only at coarser administrative…
The difference-in-differences (DiD) design is a quasi-experimental method for estimating treatment effects. In staggered DiD with multiple treatment groups and periods, estimation based on the two-way fixed effects model yields negative…
Dynamic structural equation models (DSEMs) combine time-series modeling of within-person processes with hierarchical modeling of between-person differences and differences between timepoints, and have become very popular for the analysis of…
Bounded continuous data on the unit interval frequently arise in applied fields and often exhibit a non-negligible proportion of observations at the boundaries. Inflated regression models address this feature by combining a continuous…
Multiple seasonalities have been widely studied in continuous time series using models such as TBATS, for instance in electricity demand forecasting. However, their treatment in categorical time series, such as air quality index (AQI) data,…
Regression discontinuity designs (RDD) are widely used for causal inference. In many empirical applications, treatment effects vary substantially with covariates, and ignoring such heterogeneity can lead to misleading conclusions, which…
This paper is motivated by a cutting-edge application in neuroscience: the analysis of electroencephalogram (EEG) signals recorded under flash stimulation. Under commonly used signal-processing assumptions, only the phase angle of the EEG…
Many learning tasks represent responses as multivariate probability measures, requiring repeated computation of weighted barycenters in Wasserstein space. In multivariate settings, transport barycenters are often computationally demanding…
Method validation and study design in causal inference rely on synthetic data with known counterfactuals. Existing simulators trade off distributional realism, the ability to capture mixed-type and multimodal tabular data, against causal…
Fine stratification survey is useful in many applications as its point estimator is unbiased, but the variance estimator under the design cannot be easily obtained, particularly when the sample size per stratum is as small as one unit. One…