统计方法学
Methods for causal inference are well developed for binary and continuous exposures, but in many settings, the exposure has a substantial mass at zero-such exposures are called semi-continuous. We propose a general causal framework for such…
We propose a conditional independence (CI) test based on a new measure, the \emph{spectral generalized covariance measure} (SGCM). The SGCM is constructed by expressing the squared norm of the conditional cross-covariance operator in…
Tests of goodness of fit are used in nearly every domain where statistics is applied. One powerful and flexible approach is to sample artificial data sets that are exchangeable with the real data under the null hypothesis (but not under the…
Financial spillovers in interconnected systems, such as global banking networks, require tools that capture temporal and frequency dynamics, while incorporating the underlying network topology. While current network time series models are…
Statistical methods for metric spaces provide a general and versatile framework for analyzing complex data types. We introduce a novel approach for constructing confidence regions around new predictions from any bagged regression algorithm…
Real-world vaccine effectiveness has increasingly been studied using matching-based approaches, particularly in observational cohort studies following the target trial emulation framework. Although matching is appealing in its simplicity,…
Overestimation of turnout has long been an issue in election surveys, with nonresponse bias or voter overrepresentation identified as major sources of bias. However, adjusting for nonignorable nonresponse bias is substantially challenging.…
This paper introduces the Eigenvalue-Based Randomness (EBR) test - a novel approach rooted in the Tracy-Widom law from random matrix theory - and applies it to the context of residual analysis in panel data models. Unlike traditional…
The analysis of randomized controlled trials is often complicated by intercurrent events (IEs) -- events that occur after treatment initiation and affect either the interpretation or existence of outcome measurements. Examples include…
Advancements in computational power and methodologies have enabled research on massive datasets. However, tools for analyzing data with directional or periodic characteristics, such as wind directions and customers' arrival time in 24-hour…
From environmental sciences to finance, there is a growing demand for methods that can assess the risks of extreme events beyond those observed in available data. Extrapolating extreme events beyond the range of the data is not obvious.…
Missing data is a common challenge in studying treatment effects. In the context of mediation analysis, this paper addresses missingness in the mediator and outcome, focusing on identification. We first consider self-separated missingness…
Recent work has focused on nonparametric estimation of conditional treatment effects, but inference has remained relatively unexplored. We propose a class of nonparametric tests for both quantitative and qualitative treatment effect…
Causal effects are often characterized with population summaries. These might provide an incomplete picture when there are heterogeneous treatment effects across subgroups. Since the subgroup structure is typically unknown, it is more…
In biomedical studies, testing for differences in covariance offers scientific insights beyond mean differences, especially when differences are driven by complex joint behavior between features. However, when differences in joint behavior…
The two-phase sampling design is a cost-effective strategy widely used in public health research. Analyzing the Phase II sample often involves creating subsample-specific weights. However, these weights can be highly variable, leading to…
This paper introduces a novel measure to quantify the directional dependence of extreme events between two variables. The proposed approach is designed to capture asymmetric tail dependence by studying conditional tail expectations of…
In this paper, we consider the academic department ranking system of Italy, which is based on a performance index named Indice Standardizzato di Performance Dipartimentale (ISPD). While critiques to the ISPD have been moved for its marked…
We study high-dimensional mediation analysis in which exposures, mediators, and outcomes are all multivariate, and both exposures and mediators may be high-dimensional. We formalize this as a many (exposures)-to-many (mediators)-to-many…
We introduce a family of scale-invariant entropy statistics derived from logarithmically aggregated distance distributions of point processes, with prime numbers serving as a motivating example. The construction associates to each finite…