Related papers: Doubly robust and computationally efficient high-d…
Confounding control is crucial and yet challenging for causal inference based on observational studies. Under the typical unconfoundness assumption, augmented inverse probability weighting (AIPW) has been popular for estimating the average…
We consider variable selection in high-dimensional linear models where the number of covariates greatly exceeds the sample size. We introduce the new concept of partial faithfulness and use it to infer associations between the covariates…
Performance variability is an important measure for a reliable high performance computing (HPC) system. Performance variability is affected by complicated interactions between numerous factors, such as CPU frequency, the number of…
We consider the hypothesis testing problem of detecting a shift between the means of two multivariate normal distributions in the high-dimensional setting, allowing for the data dimension p to exceed the sample size n. Specifically, we…
Analyzing principal components for multivariate data from its spatial sign covariance matrix (SCM) has been proposed as a computationally simple and robust alternative to normal PCA, but it suffers from poor efficiency properties and is…
For a data-generating process for random variables that can be described with a linear structural equation model, we consider a situation in which (i) a set of covariates satisfying the back-door criterion cannot be observed or (ii) such a…
The problem of testing changes in covariance has received increasing attention in recent years, especially in the context of high-dimensional testing. A number of approaches have been proposed, all limited to the two-sample problem and…
Conditional independence (CI) testing arises naturally in many scientific problems and applications domains. The goal of this problem is to investigate the conditional independence between a response variable $Y$ and another variable $X$,…
We introduce a framework for robust uncertainty quantification in situations where labeled training data are corrupted, through noisy or missing labels. We build on conformal prediction, a statistical tool for generating prediction sets…
Replicability is a lynchpin for credible discoveries. The partial conjunction (PC) p-value, which combines individual base p-values from multiple similar studies, can gauge whether a feature of interest exhibits replicated signals across…
A fundamental task in the analysis of datasets with many variables is screening for associations. This can be cast as a multiple testing task, where the objective is achieving high detection power while controlling type I error. We consider…
We consider the problem of testing the equality of conditional distributions of a response variable given a vector of covariates between two populations. Such a hypothesis testing problem can be motivated from various machine learning and…
Valid estimation of treatment effects from observational data requires proper control of confounding. If the number of covariates is large relative to the number of observations, then controlling for all available covariates is infeasible.…
Randomized clinical trials with time-to-event outcomes have traditionally used the log-rank test followed by the Cox proportional hazards (PH) model to estimate the hazard ratio between the treatment groups. These are valid under the…
Robust feature selection is vital for creating reliable and interpretable Machine Learning (ML) models. When designing statistical prediction models in cases where domain knowledge is limited and underlying interactions are unknown,…
In the field of multiple hypothesis testing, combining p-values represents a fundamental statistical method. The Cauchy combination test (CCT) (Liu and Xie, 2020) excels among numerous methods for combining p-values with powerful and…
For testing conditional independence (CI) of a response Y and a predictor X given covariates Z, the recently introduced model-X (MX) framework has been the subject of active methodological research, especially in the context of MX knockoffs…
In many scientific problems, researchers try to relate a response variable $Y$ to a set of potential explanatory variables $X = (X_1,\dots,X_p)$, and start by trying to identify variables that contribute to this relationship. In statistical…
It is a common saying that testing for conditional independence, i.e., testing whether whether two random vectors $X$ and $Y$ are independent, given $Z$, is a hard statistical problem if $Z$ is a continuous random variable (or vector). In…
Factor models are a class of powerful statistical models that have been widely used to deal with dependent measurements that arise frequently from various applications from genomics and neuroscience to economics and finance. As data are…