Related papers: Comparing Two Contaminated Samples
In this paper we consider a random variable $Y$ contamined by an independent additive noise $Z$. We assume that $Z$ has known distribution. Our purpose is to test the distribution of the unobserved random variable $Y$. We propose a data…
Specimens are collected from $N$ different sources. Each specimen has probability $p$ of being contaminated (e.g., in the case of an infectious disease, $p$ is the prevalence rate), independently of the other specimens. In many cases group…
A popular approach for testing if two univariate random variables are statistically independent consists of partitioning the sample space into bins, and evaluating a test statistic on the binned data. The partition size matters, and the…
Consider two random variables contaminated by two unknown transformations. The aim of this paper is to test the equality of those transformations. Two cases are distinguished: first, the two random variables have known distributions.…
In hypothesis testing, the phenomenon of label noise, in which hypothesis labels are switched at random, contaminates the likelihood functions. In this paper, we develop a new method to determine the decision rule when we do not have…
The contamination detection problem aims to determine whether a set of observations has been contaminated, i.e. whether it contains points drawn from a distribution different from the reference distribution. Here, we consider a supervised…
Evaluating whether data streams are drawn from the same distribution is at the heart of various machine learning problems. This is particularly relevant for data generated by dynamical systems since such systems are essential for many…
A number of applications require two-sample testing on ranked preference data. For instance, in crowdsourcing, there is a long-standing question of whether pairwise comparison data provided by people is distributed similar to…
Power-law distributions occur in wide variety of physical, biological, and social phenomena. In this paper, we propose a statistical hypothesis test based on the log-likelihood ratio to assess whether two samples of discrete data are drawn…
We consider the problem of testing whether pairs of univariate random variables are associated. Few tests of independence exist that are consistent against all dependent alternatives and are distribution free. We propose novel tests that…
Given well-shuffled data, can we determine whether the data items are statistically (in)dependent? Formally, we consider the problem of testing whether a set of exchangeable random variables are independent. We will show that this is…
Specimens are collected from $N$ different sources. Each specimen has probability $p$ of being contaminated, independently of the other specimens. We assume group testing is applicable, namely one can take small portions from several…
Compositional data (i.e., data comprising random variables that sum up to a constant) arises in many applications including microbiome studies, chemical ecology, political science, and experimental designs. Yet when compositional data serve…
We investigate one/two-sample mean tests for high-dimensional compositional data when the number of variables is comparable with the sample size, as commonly encountered in microbiome research. Existing methods mainly focus on max-type test…
Estimating the prevalence of a disease is necessary for evaluating and mitigating risks of its transmission within or between populations. Estimates that consider how prevalence changes with time provide more information about these risks…
In human microbiome studies, sequencing reads data are often summarized as counts of bacterial taxa at various taxonomic levels specified by a taxonomic tree. This paper considers the problem of analyzing two repeated measurements of…
In this paper, a robust non-parametric measure of statistical dependence, or correlation, between two random variables is presented. The proposed coefficient is a permutation-like statistic that quantifies how much the observed sample S_n :…
A collaborative distributed binary decision problem is considered. Two statisticians are required to declare the correct probability measure of two jointly distributed memoryless process, denoted by $X^n=(X_1,\dots,X_n)$ and…
This paper examines the statistical properties of a distributional form that arises from pooled testing for the prevalence of a binary outcome. Our base distribution is a two-parameter distribution using a prevalence and excess intensity…
Robust classification algorithms have been developed in recent years with great success. We take advantage of this development and recast the classical two-sample test problem in the framework of classification. Based on the estimates of…