Related papers: Adaptive Data Analysis for Growing Data

Generalization in Adaptive Data Analysis and Holdout Reuse

Overfitting is the bane of data analysts, even when data are plentiful. Formal approaches to understanding this problem focus on statistical inference and generalization of individual analysis procedures. Yet the practice of data analysis…

Machine Learning · Computer Science 2015-09-28 Cynthia Dwork , Vitaly Feldman , Moritz Hardt , Toniann Pitassi , Omer Reingold , Aaron Roth

Algorithmic Stability for Adaptive Data Analysis

Adaptivity is an important feature of data analysis---the choice of questions to ask about a dataset often depends on previous interactions with the same dataset. However, statistical validity is typically studied in a nonadaptive model,…

Machine Learning · Computer Science 2015-11-10 Raef Bassily , Kobbi Nissim , Adam Smith , Thomas Steinke , Uri Stemmer , Jonathan Ullman

Tight Bounds for Answering Adaptively Chosen Concentrated Queries

Most work on adaptive data analysis assumes that samples in the dataset are independent. When correlations are allowed, even the non-adaptive setting can become intractable, unless some structural constraints are imposed. To address this,…

Data Structures and Algorithms · Computer Science 2025-11-13 Emma Rapoport , Edith Cohen , Uri Stemmer

Adaptive Learning of Aggregate Analytics under Dynamic Workloads

Large organizations have seamlessly incorporated data-driven decision making in their operations. However, as data volumes increase, expensive big data infrastructures are called to rescue. In this setting, analytics tasks become very…

Databases · Computer Science 2020-03-17 Fotis Savva , Christos Anagnostopoulos , Peter Triantafillou

Challenges in Bayesian Adaptive Data Analysis

Traditional statistical analysis requires that the analysis process and data are independent. By contrast, the new field of adaptive data analysis hopes to understand and provide algorithms and accuracy guarantees for research as it is…

Machine Learning · Computer Science 2017-03-22 Sam Elder

More General Queries and Less Generalization Error in Adaptive Data Analysis

Adaptivity is an important feature of data analysis---typically the choice of questions asked about a dataset depends on previous interactions with the same dataset. However, generalization error is typically bounded in a non-adaptive…

Machine Learning · Computer Science 2015-11-11 Raef Bassily , Adam Smith , Thomas Steinke , Jonathan Ullman

Natural Analysts in Adaptive Data Analysis

Adaptive data analysis is frequently criticized for its pessimistic generalization guarantees. The source of these pessimistic bounds is a model that permits arbitrary, possibly adversarial analysts that optimally use information to bias…

Machine Learning · Computer Science 2019-05-14 Tijana Zrnic , Moritz Hardt

A Minimax Theory for Adaptive Data Analysis

In adaptive data analysis, the user makes a sequence of queries on the data, where at each step the choice of query may depend on the results in previous steps. The releases are often randomized in order to reduce overfitting for such…

Machine Learning · Statistics 2016-02-16 Yu-Xiang Wang , Jing Lei , Stephen E. Fienberg

The Generic Holdout: Preventing False-Discoveries in Adaptive Data Science

Adaptive data analysis has posed a challenge to science due to its ability to generate false hypotheses on moderately large data sets. In general, with non-adaptive data analyses (where queries to the data are generated without being…

Methodology · Statistics 2018-09-18 Preetum Nakkiran , Jarosław Błasiok

Adaptive Data Analysis with Correlated Observations

The vast majority of the work on adaptive data analysis focuses on the case where the samples in the dataset are independent. Several approaches and tools have been successfully applied in this context, such as differential privacy,…

Machine Learning · Computer Science 2022-01-24 Aryeh Kontorovich , Menachem Sadigurschi , Uri Stemmer

Subsampling Suffices for Adaptive Data Analysis

Ensuring that analyses performed on a dataset are representative of the entire population is one of the central problems in statistics. Most classical techniques assume that the dataset is independent of the analyst's query and break down…

Machine Learning · Computer Science 2024-09-25 Guy Blanc

Generalization in the Face of Adaptivity: A Bayesian Perspective

Repeated use of a data sample via adaptively chosen queries can rapidly lead to overfitting, wherein the empirical evaluation of queries on the sample significantly deviates from their mean with respect to the underlying data distribution.…

Machine Learning · Computer Science 2024-04-26 Moshe Shenfeld , Katrina Ligett

Over-the-Air Federated Adaptive Data Analysis: Preserving Accuracy via Opportunistic Differential Privacy

Adaptive data analysis (ADA) involves a dynamic interaction between an analyst and a dataset owner, where the analyst submits queries sequentially, adapting them based on previous answers. This process can become adversarial, as the analyst…

Human-Computer Interaction · Computer Science 2025-01-22 Amir Hossein Hadavi , Mohammad M. Mojahedian , Mohammad Reza Aref

How much does your data exploration overfit? Controlling bias via information usage

Modern data is messy and high-dimensional, and it is often not clear a priori what are the right questions to ask. Instead, the analyst typically needs to use the data to search for interesting analyses to perform and hypotheses to test.…

Machine Learning · Statistics 2019-10-09 Daniel Russo , James Zou

Generalization for Adaptively-chosen Estimators via Stable Median

Datasets are often reused to perform multiple statistical analyses in an adaptive way, in which each analysis may depend on the outcomes of previous analyses on the same dataset. Standard statistical guarantees do not account for these…

Machine Learning · Computer Science 2017-06-19 Vitaly Feldman , Thomas Steinke

Hesitant Adaptive Search with Estimation and Quantile Adaptive Search for Global Optimization with Noise

Adaptive random search approaches have been shown to be effective for global optimization problems, where under certain conditions, the expected performance time increases only linearly with dimension. However, previous analyses assume that…

Optimization and Control · Mathematics 2022-03-22 David D. Linz , Zelda B. Zabinsky

Sampling Without Compromising Accuracy in Adaptive Data Analysis

In this work, we study how to use sampling to speed up mechanisms for answering adaptive queries into datasets without reducing the accuracy of those mechanisms. This is important to do when both the datasets and the number of queries asked…

Machine Learning · Computer Science 2020-01-03 Benjamin Fish , Lev Reyzin , Benjamin I. P. Rubinstein

Data-conforming data-driven control: avoiding premature generalizations beyond data

Data-driven and adaptive control approaches face the problem of introducing sudden distributional shifts beyond the distribution of data encountered during learning. Therefore, they are prone to invalidating the very assumptions used in…

Systems and Control · Electrical Eng. & Systems 2025-08-25 Mohammad Ramadan , Evan Toler , Mihai Anitescu

Efficient Adaptive Data Analysis over Dense Distributions

Modern data workflows are inherently adaptive, repeatedly querying the same dataset to refine and validate sequential decisions, but such adaptivity can lead to overfitting and invalid statistical inference. Adaptive Data Analysis (ADA)…

Machine Learning · Computer Science 2026-02-10 Joon Suk Huh

Algorithms and Theory for Supervised Gradual Domain Adaptation

The phenomenon of data distribution evolving over time has been observed in a range of applications, calling the needs of adaptive learning algorithms. We thus study the problem of supervised gradual domain adaptation, where labeled data…

Machine Learning · Computer Science 2022-11-15 Jing Dong , Shiji Zhou , Baoxiang Wang , Han Zhao