Related papers: Testing Data Binnings
We study the question of identity testing for structured distributions. More precisely, given samples from a {\em structured} distribution $q$ over $[n]$ and an explicit distribution $p$ over $[n]$, we wish to distinguish whether $q=p$…
Given samples from an unknown distribution $p$ and a description of a distribution $q$, are $p$ and $q$ close or far? This question of "identity testing" has received significant attention in the case of testing whether $p$ and $q$ are…
In this work, we revisit the problem of uniformity testing of discrete probability distributions. A fundamental problem in distribution testing, testing uniformity over a known domain has been addressed over a significant line of works, and…
We study the problem of testing identity against a given distribution with a focus on the high confidence regime. More precisely, given samples from an unknown distribution $p$ over $n$ elements, an explicitly given distribution $q$, and…
Testing whether the observed data conforms to a purported model (probability distribution) is a basic and fundamental statistical task, and one that is by now well understood. However, the standard formulation, identity testing, fails to…
We investigate the problem of identity testing for multidimensional histogram distributions. A distribution $p: D \rightarrow \mathbb{R}_+$, where $D \subseteq \mathbb{R}^d$, is called a $k$-histogram if there exists a partition of the…
There has been considerable recent interest in distribution-tests whose run-time and sample requirements are sublinear in the domain-size $k$. We study two of the most important tests under the conditional-sampling model where each query…
We study a hypothesis testing problem in which data is compressed distributively and sent to a detector that seeks to decide between two possible distributions for the data. The aim is to characterize all achievable encoding rates and…
We provide a distribution-free test that can be used to determine whether any two joint distributions $p$ and $q$ are statistically different by inspection of a large enough set of samples. Following recent efforts from Long et al. [1], we…
We propose a new setting for testing properties of distributions while receiving samples from several distributions, but few samples per distribution. Given samples from $s$ distributions, $p_1, p_2, \ldots, p_s$, we design testers for the…
We study the problems of identity and closeness testing of $n$-dimensional product distributions. Prior works by Canonne, Diakonikolas, Kane and Stewart (COLT 2017) and Daskalakis and Pan (COLT 2017) have established tight sample complexity…
We examine the extent to which sublinear-sample property testing and estimation apply to settings where samples are independently but not identically distributed. Specifically, we consider the following distributional property testing…
Testing to see whether a given data set comes from some specified distribution is among the oldest types of problems in Statistics. Many such tests have been developed and their performance studied. The general result has been that while a…
We study distribution testing without direct access to a source of relevant data, but rather to one where only a tiny fraction is relevant. To enable this, we introduce the following verification query model. The goal is to perform a…
We investigate the problem of testing the equivalence between two discrete histograms. A {\em $k$-histogram} over $[n]$ is a probability distribution that is piecewise constant over some set of $k$ intervals over $[n]$. Histograms have been…
There has been significant study on the sample complexity of testing properties of distributions over large domains. For many properties, it is known that the sample complexity can be substantially smaller than the domain size. For example,…
We study the problem of testing discrete distributions with a focus on the high probability regime. Specifically, given samples from one or more discrete distributions, a property $\mathcal{P}$, and parameters $0< \epsilon, \delta <1$, we…
We consider the problem of testing distribution identity. Given a sequence of independent samples from an unknown distribution on a domain of size n, the goal is to check if the unknown distribution approximately equals a known distribution…
We consider the identity testing problem - or goodness-of-fit testing problem - in multivariate binomial families, multivariate Poisson families and multinomial distributions. Given a known distribution $p$ and $n$ iid samples drawn from an…
We investigate the problem of testing whether a discrete probability distribution over an ordered domain is a histogram on a specified number of bins. One of the most common tools for the succinct approximation of data, $k$-histograms over…