Related papers: Deploying the Conditional Randomization Test in Hi…
We consider the problem of conditional independence testing: given a response Y and covariates (X,Z), we test the null hypothesis that Y is independent of X given Z. The conditional randomization test (CRT) was recently proposed as a way to…
We propose a new method named the Conditional Randomization Rank Test (CRRT) for testing conditional independence of a response variable Y and a covariate variable X, conditional on the rest of the covariates Z. The new method generalizes…
Identifying the relevant variables for a classification model with correct confidence levels is a central but difficult task in high-dimension. Despite the core role of sparse logistic regression in statistics and machine learning, it still…
Controlling the false discovery rate (FDR) is a powerful approach to multiple testing. In many applications, the tested hypotheses have an inherent hierarchical structure. In this paper, we focus on the fixed sequence structure where the…
The conditional randomization test (CRT) was recently proposed to test whether two random variables X and Y are conditionally independent given random variables Z. The CRT assumes that the conditional distribution of X given Z is known…
Algorithms that ensure reproducible findings from large-scale, high-dimensional data are pivotal in numerous signal processing applications. In recent years, multivariate false discovery rate (FDR) controlling methods have emerged,…
We propose sequential multiple testing procedures which control the false discover rate (FDR) or the positive false discovery rate (pFDR) under arbitrary dependence between the data streams. This is accomplished by "optimizing" an upper…
Controlling the false discovery rate (FDR) in variable selection becomes challenging when predictors are correlated, as existing methods often exclude all members of correlated groups and consequently perform poorly for prediction. We…
In many scientific problems, researchers try to relate a response variable $Y$ to a set of potential explanatory variables $X = (X_1,\dots,X_p)$, and start by trying to identify variables that contribute to this relationship. In statistical…
False discovery rate (FDR) control is a popular approach for maintaining the integrity of statistical analyses, especially in high-dimensional data settings, where multiple comparisons increase the risk of false positives. FDR control has…
We consider testing multivariate conditional independence between a response Y and a covariate vector X given additional variables Z. We introduce the Multivariate Sufficient Statistic Conditional Randomization Test (MS-CRT), which…
Modern machine learning models are highly expressive but notoriously difficult to analyze statistically. In particular, while black-box predictors can achieve strong empirical performance, they rarely provide valid hypothesis tests or…
Controlling the false discovery rate (FDR) is a popular approach to multiple testing, variable selection, and related problems of simultaneous inference. In many contemporary applications, models are not specified by discrete variables,…
While data-driven confounder selection requires careful consideration, it is frequently employed in observational studies. Widely recognized criteria for confounder selection include the minimal-set approach, which involves selecting…
Multivariate statistics are often available as well as necessary in hypothesis tests. We study how to use such statistics to control not only false discovery rate (FDR) but also positive FDR (pFDR) with good power. We show that FDR can be…
Simultaneously performing variable selection and inference in high-dimensional regression models is an open challenge in statistics and machine learning. The increasing availability of vast amounts of variables requires the adoption of…
Testing whether a variable of interest affects the outcome is one of the most fundamental problem in statistics and is often the main scientific question of interest. To tackle this problem, the conditional randomization test (CRT) is…
We propose a general and flexible procedure for testing multiple hypotheses about sequential (or streaming) data that simultaneously controls both the false discovery rate (FDR) and false nondiscovery rate (FNR) under minimal assumptions…
Conditional independence (CI) testing is a fundamental task in modern statistics and machine learning. The conditional randomization test (CRT) was recently introduced to test whether two random variables, $X$ and $Y$, are conditionally…
We propose the Terminating-Random Experiments (T-Rex) selector, a fast variable selection method for high-dimensional data. The T-Rex selector controls a user-defined target false discovery rate (FDR) while maximizing the number of selected…