Related papers: Machine Learning for Two-Sample Testing under Righ…

Two-sample Testing Using Deep Learning

We propose a two-sample testing procedure based on learned deep neural network representations. To this end, we define two test statistics that perform an asymptotic location test on data samples mapped onto a hidden layer. The tests are…

Machine Learning · Statistics 2020-03-11 Matthias Kirchler , Shahryar Khorasani , Marius Kloft , Christoph Lippert

The Integration of Machine Learning into Automated Test Generation: A Systematic Mapping Study

Context: Machine learning (ML) may enable effective automated test generation. Objective: We characterize emerging research, examining testing practices, researcher goals, ML techniques applied, evaluation, and challenges. Methods: We…

Software Engineering · Computer Science 2023-04-18 Afonso Fontes , Gregory Gay

AutoML Two-Sample Test

Two-sample tests are important in statistics and machine learning, both as tools for scientific discovery as well as to detect distribution shifts. This led to the development of many sophisticated test procedures going beyond the standard…

Machine Learning · Computer Science 2023-01-18 Jonas M. Kübler , Vincent Stimper , Simon Buchholz , Krikamol Muandet , Bernhard Schölkopf

Challenges in Obtaining Valid Causal Effect Estimates with Machine Learning Algorithms

Unlike parametric regression, machine learning (ML) methods do not generally require precise knowledge of the true data generating mechanisms. As such, numerous authors have advocated for ML methods to estimate causal effects.…

Methodology · Statistics 2020-05-15 Ashley I Naimi , Alan E Mishler , Edward H Kennedy

Support Vector Regression for Right Censored Data

We develop a unified approach for classification and regression support vector machines for data subject to right censoring. We provide finite sample bounds on the generalization error of the algorithm, prove risk consistency for a wide…

Machine Learning · Statistics 2013-01-15 Yair Goldberg , Michael R. Kosorok

A label-efficient two-sample test

Two-sample tests evaluate whether two samples are realizations of the same distribution (the null hypothesis) or two different distributions (the alternative hypothesis). We consider a new setting for this problem where sample features are…

Machine Learning · Computer Science 2022-07-20 Weizhi Li , Gautam Dasarathy , Karthikeyan Natesan Ramamurthy , Visar Berisha

ML-assisted Randomization Tests for Detecting Treatment Effects in A/B Experiments

Experimentation is widely utilized for causal inference and data-driven decision-making across disciplines. In an A/B experiment, for example, an online business randomizes two different treatments (e.g., website designs) to their customers…

Methodology · Statistics 2025-01-15 Wenxuan Guo , JungHo Lee , Panos Toulis

Learning continuous models for continuous physics

Dynamical systems that evolve continuously over time are ubiquitous throughout science and engineering. Machine learning (ML) provides data-driven approaches to model and predict the dynamics of such systems. A core issue with this approach…

Machine Learning · Computer Science 2023-11-23 Aditi S. Krishnapriyan , Alejandro F. Queiruga , N. Benjamin Erichson , Michael W. Mahoney

Global and Local Two-Sample Tests via Regression

Two-sample testing is a fundamental problem in statistics. Despite its long history, there has been renewed interest in this problem with the advent of high-dimensional and complex data. Specifically, in the machine learning literature,…

Methodology · Statistics 2019-11-19 Ilmun Kim , Ann B. Lee , Jing Lei

Meta Two-Sample Testing: Learning Kernels for Testing with Limited Data

Modern kernel-based two-sample tests have shown great success in distinguishing complex, high-dimensional distributions with appropriate learned kernels. Previous work has demonstrated that this kernel learning procedure succeeds, assuming…

Machine Learning · Statistics 2022-01-06 Feng Liu , Wenkai Xu , Jie Lu , Danica J. Sutherland

A Double Machine Learning Approach to Combining Experimental and Observational Data

Experimental and observational studies often lack validity due to untestable assumptions. We propose a double machine learning approach to combine experimental and observational studies, allowing practitioners to test for assumption…

Methodology · Statistics 2025-11-26 Harsh Parikh , Marco Morucci , Vittorio Orlandi , Sudeepa Roy , Cynthia Rudin , Alexander Volfovsky

MMD Two-sample Testing in the Presence of Arbitrarily Missing Data

In many real-world applications, it is common that a proportion of the data may be missing or only partially observed. We develop a novel two-sample testing method based on the Maximum Mean Discrepancy (MMD) which accounts for missing data…

Methodology · Statistics 2024-05-27 Yijin Zeng , Niall M. Adams , Dean A. Bodenham

Machine Learning Testing: Survey, Landscapes and Horizons

This paper provides a comprehensive survey of Machine Learning Testing (ML testing) research. It covers 144 papers on testing properties (e.g., correctness, robustness, and fairness), testing components (e.g., the data, learning program,…

Machine Learning · Computer Science 2019-12-24 Jie M. Zhang , Mark Harman , Lei Ma , Yang Liu

Advanced Tutorial: Label-Efficient Two-Sample Tests

Hypothesis testing is a statistical inference approach used to determine whether data supports a specific hypothesis. An important type is the two-sample test, which evaluates whether two sets of data points are from identical…

Machine Learning · Computer Science 2025-01-08 Weizhi Li , Visar Berisha , Gautam Dasarathy

Double/Debiased Machine Learning for Treatment and Causal Parameters

Most modern supervised statistical/machine learning (ML) methods are explicitly designed to solve prediction problems very well. Achieving this goal does not imply that these methods automatically deliver good estimators of causal…

Machine Learning · Statistics 2024-11-05 Victor Chernozhukov , Denis Chetverikov , Mert Demirer , Esther Duflo , Christian Hansen , Whitney Newey , James Robins

Double Machine Learning meets Panel Data -- Promises, Pitfalls, and Potential Solutions

Estimating causal effect using machine learning (ML) algorithms can help to relax functional form assumptions if used within appropriate frameworks. However, most of these frameworks assume settings with cross-sectional data, whereas…

Econometrics · Economics 2024-09-04 Jonathan Fuhr , Dominik Papies

Power Studies For Two-sample Methods For Multivariate Data

We present the results of a large number of simulation studies regarding the power of various non-parametric two-sample tests for multivariate data. This includes both continuous and discrete data. In general no single method can be relied…

Methodology · Statistics 2025-07-23 Wolfgang Rolke

Data Transformations and Goodness-of-Fit Tests for Type-II Right Censored Samples

We suggest several goodness-of-fit methods which are appropriate with Type-II right censored data. Our strategy is to transform the original observations from a censored sample into an approximately i.i.d. sample of normal variates and then…

Methodology · Statistics 2013-12-12 Christian Goldmann , Bernhard Klar , Simos G. Meintanis

Learn to Unlearn: A Survey on Machine Unlearning

Machine Learning (ML) models have been shown to potentially leak sensitive information, thus raising privacy concerns in ML-driven applications. This inspired recent research on removing the influence of specific data samples from a trained…

Machine Learning · Computer Science 2023-10-30 Youyang Qu , Xin Yuan , Ming Ding , Wei Ni , Thierry Rakotoarivelo , David Smith

Revisiting Classifier Two-Sample Tests

The goal of two-sample tests is to assess whether two samples, $S_P \sim P^n$ and $S_Q \sim Q^m$, are drawn from the same distribution. Perhaps intriguingly, one relatively unexplored method to build two-sample tests is the use of binary…

Machine Learning · Statistics 2018-03-14 David Lopez-Paz , Maxime Oquab