统计方法学
Policy-Relevant Treatment Effects (PRTEs) are generally not point-identified under standard Instrumental Variable (IV) assumptions when the instrument generates limited support in treatment propensity. We show that PRTE partial…
Modern regression analysis often involves responses and predictors taking values in the same or distinct metric spaces. To rank non-Euclidean heterogeneous predictors in regression by explanatory strength, analogous to the classical $R^2$,…
Understanding the behavior of black-box large language models and determining effective means of comparing their performance is a key task in modern machine learning. We consider how large language models respond to a specific query by…
Threshold-free cluster enhancement (TFCE) is widely used for cluster-based inference in neuroimaging, but existing implementations typically rely on discretized approximations that may introduce numerical variability. We present eTFCE, an…
Causal decomposition analysis (CDA) is an approach for modeling the impact of hypothetical interventions to reduce disparities. It is useful for identifying foci that future interventions, including multilevel and multimodal interventions,…
Generalized Linear Model (or GLM) extends the ordinary linear regression by linking the mean of the response variable to covariates through appropriate link functions. GLM is widely used in the analysis of datasets arising from diverse…
Two binary instrumental variables (IVs) are nested if individuals who comply under one binary IV also comply under the other. This situation often arises when the two IVs represent different intensities of encouragement or discouragement to…
It is often of interest to study the association between covariates and the cumulative incidence of a right-censored time-to-event outcome. When time-varying covariates are measured on a fixed discrete time scale, it is desirable to account…
Structural causal models (SCMs), with an underlying directed acyclic graph (DAG), provide a powerful analytical framework to describe the interaction mechanisms in large-scale complex systems. However, when the system exhibits extreme…
Investigators are often interested in how a treatment affects an outcome for units responding to treatment in a certain way. We may wish to know the effect among units that, for example, meaningfully implemented an intervention, passed an…
This paper studies model selection for general unit-root time series, including the case with many exogenous predictors. We propose a new model selection algorithm, FHTD, that leverages forward stepwise regression (FSR), a high-dimensional…
In ecology, the description of species composition and biodiversity calls for statistical methods that involve estimating features of interest in unobserved samples based on an observed one. In the last decade, the Bayesian nonparametrics…
Many real-world networks evolve dynamically over time and present different types of connections between nodes, often called layers. In this work, we propose a latent position model for these objects, called the dynamic multiplex random dot…
Spatio-temporal areal data can be seen as a collection of time series which are spatially correlated, according to a specific neighbouring structure. Motivated by a dataset on mobile phone usage in the Metropolitan area of Milan, Italy, we…
Exponential families form the backbone of modern statistics and machine learning, but textbooks seldom derive them from first principles in an accessible way. Although minimal sufficiency and the principle of maximum entropy, originating in…
This study quantifies the association between air pollution and mortality in Ontario, Canada. Exposure-response relationships in air pollution epidemiology are complex due to three features: time-lagged associations, non-linear…
We introduce Robust Bayesian Sequential Borrowing (RBSB), a framework for extrapolating evidence across adjacent subgroups in multi-population clinical programmes where studies are conducted in sequence and populations are ordered by…
A nonparametric model using a sequence of Bernstein polynomials is constructed to approximate arbitrary isotropic covariance functions valid in $\mathbb{R}^\infty$ and related approximation properties are investigated using the popular…
Most clinical prediction studies are developed from retrospective cohorts and reported as if all patient information were observed at once. In practice, clinicians face a more consequential question: \emph{when is there already enough…
We propose a unified probabilistic framework for sparse count tensors with excess zeros, motivated by single-cell Hi-C data. The observed data are naturally represented as a three-way tensor indexed by genomic loci pairs and cells,…