English
Related papers

Related papers: Preserving Statistical Validity in Adaptive Data A…

200 papers

We show that, under a standard hardness assumption, there is no computationally efficient algorithm that given $n$ samples from an unknown distribution can give valid answers to $n^{3+o(1)}$ adaptively chosen statistical queries. A…

Machine Learning · Computer Science 2014-08-08 Moritz Hardt , Jonathan Ullman

Overfitting is the bane of data analysts, even when data are plentiful. Formal approaches to understanding this problem focus on statistical inference and generalization of individual analysis procedures. Yet the practice of data analysis…

Machine Learning · Computer Science 2015-09-28 Cynthia Dwork , Vitaly Feldman , Moritz Hardt , Toniann Pitassi , Omer Reingold , Aaron Roth

Prediction, where observed data is used to quantify uncertainty about a future observation, is a fundamental problem in statistics. Prediction sets with coverage probability guarantees are a common solution, but these do not provide…

Statistics Theory · Mathematics 2022-11-22 Leonardo Cella , Ryan Martin

Adaptivity is an important feature of data analysis---the choice of questions to ask about a dataset often depends on previous interactions with the same dataset. However, statistical validity is typically studied in a nonadaptive model,…

Machine Learning · Computer Science 2015-11-10 Raef Bassily , Kobbi Nissim , Adam Smith , Thomas Steinke , Uri Stemmer , Jonathan Ullman

Inference is the process of using facts we know to learn about facts we do not know. A theory of inference gives assumptions necessary to get from the former to the latter, along with a definition for and summary of the resulting…

Machine Learning · Statistics 2021-09-27 Beau Coker , Cynthia Rudin , Gary King

Datasets are often reused to perform multiple statistical analyses in an adaptive way, in which each analysis may depend on the outcomes of previous analyses on the same dataset. Standard statistical guarantees do not account for these…

Machine Learning · Computer Science 2017-06-19 Vitaly Feldman , Thomas Steinke

Traditional statistical theory assumes that the analysis to be performed on a given data set is selected independently of the data themselves. This assumption breaks downs when data are re-used across analyses and the analysis to be…

Machine Learning · Computer Science 2017-06-06 Adam Smith

In scientific inference problems, the underlying statistical modeling assumptions have a crucial impact on the end results. There exist, however, only a few automatic means for validating these fundamental modelling assumptions. The…

Methodology · Statistics 2019-05-21 Andreas Svensson , Dave Zachariah , Petre Stoica , Thomas B. Schön

Adaptive experiments use preliminary analyses of the data to inform further course of action and are commonly used in many disciplines including medical and social sciences. Because the null hypothesis and experimental design are…

Methodology · Statistics 2026-05-26 Tobias Freidling , Qingyuan Zhao , Zijun Gao

As artificial intelligence and machine learning tools become more accessible, and scientists face new obstacles to data collection (e.g., rising costs, declining survey response rates), researchers increasingly use predictions from…

Machine Learning · Statistics 2025-12-08 Stephen Salerno , Kentaro Hoffman , Awan Afiaz , Anna Neufeld , Tyler H. McCormick , Jeffrey T. Leek

We provide an approach to exploratory data analysis in matched observational studies with a single intervention and multiple endpoints. In such settings, the researcher would like to explore evidence for actual treatment effects among these…

Methodology · Statistics 2025-12-10 Mengqi Lin , Colin Fogarty

We consider the problem of efficient inference of the Average Treatment Effect in a sequential experiment where the policy governing the assignment of subjects to treatment or control can change over time. We first provide a central limit…

Machine Learning · Statistics 2024-03-05 Thomas Cook , Alan Mishler , Aaditya Ramdas

Bandit algorithms are increasingly used in real-world sequential decision-making problems. Associated with this is an increased desire to be able to use the resulting datasets to answer scientific questions like: Did one type of ad lead to…

Machine Learning · Computer Science 2021-11-23 Kelly W. Zhang , Lucas Janson , Susan A. Murphy

Adaptivity is an important feature of data analysis---typically the choice of questions asked about a dataset depends on previous interactions with the same dataset. However, generalization error is typically bounded in a non-adaptive…

Machine Learning · Computer Science 2015-11-11 Raef Bassily , Adam Smith , Thomas Steinke , Jonathan Ullman

While the traditional viewpoint in machine learning and statistics assumes training and testing samples come from the same population, practice belies this fiction. One strategy -- coming from robust statistics and optimization -- is thus…

Machine Learning · Statistics 2024-07-08 Maxime Cauchois , Suyash Gupta , Alnur Ali , John C. Duchi

Predictive inference is a fundamental task in statistics, traditionally addressed using parametric assumptions about the data distribution and detailed analyses of how models learn from data. In recent years, conformal prediction has…

Methodology · Statistics 2026-03-26 Matteo Sesia , Stefano Favaro

Online reinforcement learning and other adaptive sampling algorithms are increasingly used in digital intervention experiments to optimize treatment delivery for users over time. In this work, we focus on longitudinal user data collected by…

Machine Learning · Computer Science 2023-04-20 Kelly W. Zhang , Lucas Janson , Susan A. Murphy

There is an increasing concern that most current published research findings are false. The main cause seems to lie in the fundamental disconnection between theory and practice in data analysis. While the former typically relies on…

Machine Learning · Statistics 2019-03-06 Amedeo Roberto Esposito , Michael Gastpar , Ibrahim Issa

Adaptive data analysis has posed a challenge to science due to its ability to generate false hypotheses on moderately large data sets. In general, with non-adaptive data analyses (where queries to the data are generated without being…

Methodology · Statistics 2018-09-18 Preetum Nakkiran , Jarosław Błasiok

The most fundamental problem in statistics is the inference of an unknown probability distribution from a finite number of samples. For a specific observed data set, answers to the following questions would be desirable: (1) Estimation:…

Statistics Theory · Mathematics 2013-01-23 Ali Kinkhabwala
‹ Prev 1 2 3 10 Next ›