Related papers: Challenges in Bayesian Adaptive Data Analysis

Tight Bounds for Answering Adaptively Chosen Concentrated Queries

Most work on adaptive data analysis assumes that samples in the dataset are independent. When correlations are allowed, even the non-adaptive setting can become intractable, unless some structural constraints are imposed. To address this,…

Data Structures and Algorithms · Computer Science 2025-11-13 Emma Rapoport , Edith Cohen , Uri Stemmer

Adaptive Data Analysis for Growing Data

Reuse of data in adaptive workflows poses challenges regarding overfitting and the statistical validity of results. Previous work has demonstrated that interacting with data via differentially private algorithms can mitigate overfitting,…

Machine Learning · Computer Science 2025-11-13 Neil G. Marchant , Benjamin I. P. Rubinstein

Generalization in the Face of Adaptivity: A Bayesian Perspective

Repeated use of a data sample via adaptively chosen queries can rapidly lead to overfitting, wherein the empirical evaluation of queries on the sample significantly deviates from their mean with respect to the underlying data distribution.…

Machine Learning · Computer Science 2024-04-26 Moshe Shenfeld , Katrina Ligett

Bayesian Adaptive Data Analysis Guarantees from Subgaussianity

The new field of adaptive data analysis seeks to provide algorithms and provable guarantees for models of machine learning that allow researchers to reuse their data, which normally falls outside of the usual statistical paradigm of static…

Machine Learning · Computer Science 2017-03-22 Sam Elder

Algorithmic Stability for Adaptive Data Analysis

Adaptivity is an important feature of data analysis---the choice of questions to ask about a dataset often depends on previous interactions with the same dataset. However, statistical validity is typically studied in a nonadaptive model,…

Machine Learning · Computer Science 2015-11-10 Raef Bassily , Kobbi Nissim , Adam Smith , Thomas Steinke , Uri Stemmer , Jonathan Ullman

Adaptive Data Analysis in a Balanced Adversarial Model

In adaptive data analysis, a mechanism gets $n$ i.i.d. samples from an unknown distribution $D$, and is required to provide accurate estimations to a sequence of adaptively chosen statistical queries with respect to $D$. Hardt and Ullman…

Machine Learning · Computer Science 2023-11-07 Kobbi Nissim , Uri Stemmer , Eliad Tsfadia

More General Queries and Less Generalization Error in Adaptive Data Analysis

Adaptivity is an important feature of data analysis---typically the choice of questions asked about a dataset depends on previous interactions with the same dataset. However, generalization error is typically bounded in a non-adaptive…

Machine Learning · Computer Science 2015-11-11 Raef Bassily , Adam Smith , Thomas Steinke , Jonathan Ullman

High dimensionality: The latest challenge to data analysis

The advent of modern technology, permitting the measurement of thousands of characteristics simultaneously, has given rise to floods of data characterized by many large or even huge datasets. This new paradigm presents extraordinary…

Methodology · Statistics 2019-02-14 A. M. Pires , J. A. Branco

Subsampling Suffices for Adaptive Data Analysis

Ensuring that analyses performed on a dataset are representative of the entire population is one of the central problems in statistics. Most classical techniques assume that the dataset is independent of the analyst's query and break down…

Machine Learning · Computer Science 2024-09-25 Guy Blanc

Adapting the ABC distance function

Approximate Bayesian computation performs approximate inference for models where likelihood computations are expensive or impossible. Instead simulations from the model are performed for various parameter values and accepted if they are…

Computation · Statistics 2015-12-16 Dennis Prangle

Slamming the sham: A Bayesian model for adaptive adjustment with noisy control data

It is not always clear how to adjust for control data in causal inference, balancing the goals of reducing bias and variance. We show how, in a setting with repeated experiments, Bayesian hierarchical modeling yields an adaptive procedure…

Methodology · Statistics 2025-01-23 Andrew Gelman , Matthijs Vákár

The Generic Holdout: Preventing False-Discoveries in Adaptive Data Science

Adaptive data analysis has posed a challenge to science due to its ability to generate false hypotheses on moderately large data sets. In general, with non-adaptive data analyses (where queries to the data are generated without being…

Methodology · Statistics 2018-09-18 Preetum Nakkiran , Jarosław Błasiok

The Simulator: Understanding Adaptive Sampling in the Moderate-Confidence Regime

We propose a novel technique for analyzing adaptive sampling called the {\em Simulator}. Our approach differs from the existing methods by considering not how much information could be gathered by any fixed sampling strategy, but how…

Machine Learning · Computer Science 2023-04-25 Max Simchowitz , Kevin Jamieson , Benjamin Recht

Adaptive Threshold Sampling

Sampling is a fundamental problem in computer science and statistics. However, for a given task and stream, it is often not possible to choose good sampling probabilities in advance. We derive a general framework for adaptively changing the…

Machine Learning · Statistics 2022-06-16 Daniel Ting

Random thoughts about Complexity, Data and Models

Data Science and Machine learning have been growing strong for the past decade. We argue that to make the most of this exciting field we should resist the temptation of assuming that forecasting can be reduced to brute-force data analytics.…

Artificial Intelligence · Computer Science 2020-05-12 Hykel Hosni , Angelo Vulpiani

Preventing False Discovery in Interactive Data Analysis is Hard

We show that, under a standard hardness assumption, there is no computationally efficient algorithm that given $n$ samples from an unknown distribution can give valid answers to $n^{3+o(1)}$ adaptively chosen statistical queries. A…

Machine Learning · Computer Science 2014-08-08 Moritz Hardt , Jonathan Ullman

Convergence Guarantees for Adaptive Bayesian Quadrature Methods

Adaptive Bayesian quadrature (ABQ) is a powerful approach to numerical integration that empirically compares favorably with Monte Carlo integration on problems of medium dimensionality (where non-adaptive quadrature is not competitive). Its…

Machine Learning · Statistics 2019-10-29 Motonobu Kanagawa , Philipp Hennig

Bayesian Adaptive Calibration and Optimal Design

The process of calibrating computer models of natural phenomena is essential for applications in the physical sciences, where plenty of domain knowledge can be embedded into simulations and then calibrated against real observations. Current…

Machine Learning · Computer Science 2025-01-20 Rafael Oliveira , Dino Sejdinovic , David Howard , Edwin V. Bonilla

A Minimax Theory for Adaptive Data Analysis

In adaptive data analysis, the user makes a sequence of queries on the data, where at each step the choice of query may depend on the results in previous steps. The releases are often randomized in order to reduce overfitting for such…

Machine Learning · Statistics 2016-02-16 Yu-Xiang Wang , Jing Lei , Stephen E. Fienberg

Sensitivity And Out-Of-Sample Error in Continuous Time Data Assimilation

Data assimilation refers to the problem of finding trajectories of a prescribed dynamical model in such a way that the output of the model (usually some function of the model states) follows a given time series of observations. Typically…

Atmospheric and Oceanic Physics · Physics 2015-05-30 Jochen Bröcker , Ivan G. Szendro