Related papers: Adaptive Data Analysis for Growing Data
Overfitting is the bane of data analysts, even when data are plentiful. Formal approaches to understanding this problem focus on statistical inference and generalization of individual analysis procedures. Yet the practice of data analysis…
Adaptivity is an important feature of data analysis---the choice of questions to ask about a dataset often depends on previous interactions with the same dataset. However, statistical validity is typically studied in a nonadaptive model,…
Most work on adaptive data analysis assumes that samples in the dataset are independent. When correlations are allowed, even the non-adaptive setting can become intractable, unless some structural constraints are imposed. To address this,…
Large organizations have seamlessly incorporated data-driven decision making in their operations. However, as data volumes increase, expensive big data infrastructures are called to rescue. In this setting, analytics tasks become very…
Traditional statistical analysis requires that the analysis process and data are independent. By contrast, the new field of adaptive data analysis hopes to understand and provide algorithms and accuracy guarantees for research as it is…
Adaptivity is an important feature of data analysis---typically the choice of questions asked about a dataset depends on previous interactions with the same dataset. However, generalization error is typically bounded in a non-adaptive…
Adaptive data analysis is frequently criticized for its pessimistic generalization guarantees. The source of these pessimistic bounds is a model that permits arbitrary, possibly adversarial analysts that optimally use information to bias…
In adaptive data analysis, the user makes a sequence of queries on the data, where at each step the choice of query may depend on the results in previous steps. The releases are often randomized in order to reduce overfitting for such…
Adaptive data analysis has posed a challenge to science due to its ability to generate false hypotheses on moderately large data sets. In general, with non-adaptive data analyses (where queries to the data are generated without being…
The vast majority of the work on adaptive data analysis focuses on the case where the samples in the dataset are independent. Several approaches and tools have been successfully applied in this context, such as differential privacy,…
Ensuring that analyses performed on a dataset are representative of the entire population is one of the central problems in statistics. Most classical techniques assume that the dataset is independent of the analyst's query and break down…
Repeated use of a data sample via adaptively chosen queries can rapidly lead to overfitting, wherein the empirical evaluation of queries on the sample significantly deviates from their mean with respect to the underlying data distribution.…
Adaptive data analysis (ADA) involves a dynamic interaction between an analyst and a dataset owner, where the analyst submits queries sequentially, adapting them based on previous answers. This process can become adversarial, as the analyst…
Modern data is messy and high-dimensional, and it is often not clear a priori what are the right questions to ask. Instead, the analyst typically needs to use the data to search for interesting analyses to perform and hypotheses to test.…
Datasets are often reused to perform multiple statistical analyses in an adaptive way, in which each analysis may depend on the outcomes of previous analyses on the same dataset. Standard statistical guarantees do not account for these…
Adaptive random search approaches have been shown to be effective for global optimization problems, where under certain conditions, the expected performance time increases only linearly with dimension. However, previous analyses assume that…
In this work, we study how to use sampling to speed up mechanisms for answering adaptive queries into datasets without reducing the accuracy of those mechanisms. This is important to do when both the datasets and the number of queries asked…
Data-driven and adaptive control approaches face the problem of introducing sudden distributional shifts beyond the distribution of data encountered during learning. Therefore, they are prone to invalidating the very assumptions used in…
Modern data workflows are inherently adaptive, repeatedly querying the same dataset to refine and validate sequential decisions, but such adaptivity can lead to overfitting and invalid statistical inference. Adaptive Data Analysis (ADA)…
The phenomenon of data distribution evolving over time has been observed in a range of applications, calling the needs of adaptive learning algorithms. We thus study the problem of supervised gradual domain adaptation, where labeled data…