Related papers: Comparing Two Contaminated Samples

Testing distribution in deconvolution problems

In this paper we consider a random variable $Y$ contamined by an independent additive noise $Z$. We assume that $Z$ has known distribution. Our purpose is to test the distribution of the unobserved random variable $Y$. We propose a data…

Statistics Theory · Mathematics 2009-01-28 Denys Pommeret

A binary search scheme for determining all contaminated specimens

Specimens are collected from $N$ different sources. Each specimen has probability $p$ of being contaminated (e.g., in the case of an infectious disease, $p$ is the prevalence rate), independently of the other specimens. In many cases group…

Probability · Mathematics 2024-01-31 Vassilis G. Papanicolaou

Consistent distribution-free $K$-sample and independence tests for univariate random variables

A popular approach for testing if two univariate random variables are statistically independent consists of partitioning the sample space into bins, and evaluating a test statistic on the binned data. The partition size matters, and the…

Methodology · Statistics 2016-04-28 Ruth Heller , Yair Heller , Shachar Kaufman , Barak Brill , Malka Gorfine

Testing for equality between two transformations of random variables

Consider two random variables contaminated by two unknown transformations. The aim of this paper is to test the equality of those transformations. Two cases are distinguished: first, the two random variables have known distributions.…

Methodology · Statistics 2011-11-01 Mohamed Boutahar , Denys Pommeret

Robust Binary Hypothesis Testing Under Contaminated Likelihoods

In hypothesis testing, the phenomenon of label noise, in which hypothesis labels are switched at random, contaminates the likelihood functions. In this paper, we develop a new method to determine the decision rule when we do not have…

Information Theory · Computer Science 2014-10-28 Dennis Wei , Kush R. Varshney

Supervised Contamination Detection, with Flow Cytometry Application

The contamination detection problem aims to determine whether a set of observations has been contaminated, i.e. whether it contains points drawn from a distribution different from the reference distribution. Here, we consider a supervised…

Methodology · Statistics 2024-04-10 Solenne Gaucher , Gilles Blanchard , Frédéric Chazal

A Kernel Two-sample Test for Dynamical Systems

Evaluating whether data streams are drawn from the same distribution is at the heart of various machine learning problems. This is particularly relevant for data generated by dynamical systems since such systems are essential for many…

Machine Learning · Statistics 2022-09-07 Friedrich Solowjow , Dominik Baumann , Christian Fiedler , Andreas Jocham , Thomas Seel , Sebastian Trimpe

Two-Sample Testing on Ranked Preference Data and the Role of Modeling Assumptions

A number of applications require two-sample testing on ranked preference data. For instance, in crowdsourcing, there is a long-standing question of whether pairwise comparison data provided by people is distributed similar to…

Machine Learning · Statistics 2020-11-20 Charvi Rastogi , Sivaraman Balakrishnan , Nihar B. Shah , Aarti Singh

Two samples test for discrete power-law distributions

Power-law distributions occur in wide variety of physical, biological, and social phenomena. In this paper, we propose a statistical hypothesis test based on the log-likelihood ratio to assess whether two samples of discrete data are drawn…

Methodology · Statistics 2015-03-03 Alessandro Bessi

Consistent distribution-free tests of association between univariate random variables

We consider the problem of testing whether pairs of univariate random variables are associated. Few tests of independence exist that are consistent against all dependent alternatives and are distribution free. We propose novel tests that…

Methodology · Statistics 2014-12-09 Ruth Heller , Yair Heller , Shachar Kaufman , Malka Gorfine

Testing Independence of Exchangeable Random Variables

Given well-shuffled data, can we determine whether the data items are statistically (in)dependent? Formally, we consider the problem of testing whether a set of exchangeable random variables are independent. We will show that this is…

Statistics Theory · Mathematics 2022-10-25 Marcus Hutter

Specimens are collected from $N$ different sources. Each specimen has probability $p$ of being contaminated, independently of the other specimens. We assume group testing is applicable, namely one can take small portions from several…

Probability · Mathematics 2024-09-24 Vassilis G. Papanicolaou

Compositional Covariate Importance Testing via Partial Conjunction of Bivariate Hypotheses

Compositional data (i.e., data comprising random variables that sum up to a constant) arises in many applications including microbiome studies, chemical ecology, political science, and experimental designs. Yet when compositional data serve…

Methodology · Statistics 2025-01-03 Ritwik Bhaduri , Siyuan Ma , Lucas Janson

On testing mean of high dimensional compositional data

We investigate one/two-sample mean tests for high-dimensional compositional data when the number of variables is comparable with the sample size, as commonly encountered in microbiome research. Existing methods mainly focus on max-type test…

Statistics Theory · Mathematics 2024-04-15 Qianqian Jiang , Wenbo Li , Zeng Li

Pool samples to efficiently estimate pathogen prevalence dynamics

Estimating the prevalence of a disease is necessary for evaluating and mitigating risks of its transmission within or between populations. Estimates that consider how prevalence changes with time provide more information about these risks…

Applications · Statistics 2021-11-12 Braden Scherting , Alison Peel , Raina Plowright , Andrew Hoegh

A Model for Paired-Multinomial Data and Its Application to Analysis of Data on a Taxonomic Tree

In human microbiome studies, sequencing reads data are often summarized as counts of bacterial taxa at various taxonomic levels specified by a taxonomic tree. This paper considers the problem of analyzing two repeated measurements of…

Applications · Statistics 2017-02-17 Pixu Shi , Hongzhe Li

A Nonparametric Test of Dependence Based on Ensemble of Decision Trees

In this paper, a robust non-parametric measure of statistical dependence, or correlation, between two random variables is presented. The proposed coefficient is a permutation-like statistic that quantifies how much the observed sample S_n :…

Methodology · Statistics 2020-07-27 Rami Mahdi

Collaborative Distributed Hypothesis Testing

A collaborative distributed binary decision problem is considered. Two statisticians are required to declare the correct probability measure of two jointly distributed memoryless process, denoted by $X^n=(X_1,\dots,X_n)$ and…

Information Theory · Computer Science 2016-04-11 Gil Katz , Pablo Piantanida , Merouane Debbah

An examination of the generalised pooled binomial distribution and its information properties

This paper examines the statistical properties of a distributional form that arises from pooled testing for the prevalence of a binary outcome. Our base distribution is a two-parameter distribution using a prevalence and excess intensity…

Methodology · Statistics 2021-08-11 Ben O'Neill , Angus McLure

Two-Sample Test Based on Classification Probability

Robust classification algorithms have been developed in recent years with great success. We take advantage of this development and recast the classical two-sample test problem in the framework of classification. Based on the estimates of…

Statistics Theory · Mathematics 2019-09-18 Haiyan Cai , Bryan Goggin , Qingtang Jiang