Related papers: Robust test statistics for data sets with missing …
Ideally, all analyses of normally distributed data should include the full covariance information between all data points. In practice, the full covariance matrix between all data points is not always available. Either because a result was…
We develop inference procedures robust to general forms of weak dependence. The procedures utilize test statistics constructed by resampling in a manner that does not depend on the unknown correlation structure of the data. We prove that…
Experiments often yield non-identically distributed data for statistical analysis. Tests of hypothesis under such set-ups are generally performed using the likelihood ratio test, which is non-robust with respect to outliers and model…
Under complete linkage disequilibrium (LD), robust tests often have greater power than Pearson's chi-square test and trend tests for the analysis of case-control genetic association studies. Robust statistics have been used in…
Various statistical tests have been developed for testing the equality of means in matched pairs with missing values. However, most existing methods are commonly based on certain distributional assumptions such as normality, 0-symmetry or…
Datasets typically contain inaccuracies due to human error and societal biases, and these inaccuracies can affect the outcomes of models trained on such datasets. We present a technique for certifying whether linear regression models are…
Spurious correlations allow flexible models to predict well during training but poorly on related test distributions. Recent work has shown that models that satisfy particular independencies involving correlation-inducing \textit{nuisance}…
We consider a data-driven robust hypothesis test where the optimal test will minimize the worst-case performance regarding distributions that are close to the empirical distributions with respect to the Wasserstein distance. This leads to a…
We demonstrate that learning procedures that rely on aggregated labels, e.g., label information distilled from noisy responses, enjoy robustness properties impossible without data cleaning. This robustness appears in several ways. In the…
The problem of testing changes in covariance has received increasing attention in recent years, especially in the context of high-dimensional testing. A number of approaches have been proposed, all limited to the two-sample problem and…
Detecting changes in high-dimensional vectors presents significant challenges, especially when the post-change distribution is unknown and time-varying. This paper introduces a novel robust algorithm for correlation change detection in…
The last decade witnessed an explosion in the availability of data for operations research applications. Motivated by this growing availability, we propose a novel schema for utilizing data to design uncertainty sets for robust optimization…
This paper introduces a new method for testing the statistical significance of estimated parameters in predictive regressions. The approach features a new family of test statistics that are robust to the degree of persistence of the…
Null hypothesis significance testing remains popular despite decades of concern about misuse and misinterpretation. We believe that much of the problem is due to language: significance testing has little to do with other meanings of the…
To assess whether there is some signal in a big database, aggregate tests for the global null hypothesis of no effect are routinely applied in practice before more specialized analysis is carried out. Although a plethora of aggregate tests…
While the traditional viewpoint in machine learning and statistics assumes training and testing samples come from the same population, practice belies this fiction. One strategy -- coming from robust statistics and optimization -- is thus…
This paper focuses on the problem of testing the null hypothesis that the regression functions of several populations are equal under a general nonparametric homoscedastic regression model. It is well known that linear kernel regression…
After variable selection, standard inferential procedures for regression parameters may not be uniformly valid; there is no finite-sample size at which a standard test is guaranteed to approximately attain its nominal size. This problem is…
Missing data is pervasive in econometric applications, and rarely is it plausible that the data are missing (completely) at random. This paper proposes a methodology for studying the robustness of results drawn from incomplete datasets.…
In genetic studies of complex diseases, the underlying mode of inheritance is often not known. Thus, the most powerful test or other optimal procedure for one model, e.g. recessive, may be quite inefficient if another model, e.g. dominant,…