Related papers: Using Data Compressors to Construct Rank Tests
Random Number Generators play a critical role in a number of important applications. In practice, statistical testing is employed to gather evidence that a generator indeed produces numbers that appear to be random. In this paper, we…
Detecting and locating changes in highly multivariate data is a major concern in several current statistical applications. In this context, the first contribution of the paper is a novel non-parametric two-sample homogeneity test for…
A massive dataset often consists of a growing number of (potentially) heterogeneous sub-populations. This paper is concerned about testing various forms of heterogeneity arising from massive data. In a general nonparametric framework, a set…
In this paper we develop a novel nonparametric framework to test the independence of two random variables $\mathbf{X}$ and $\mathbf{Y}$ with unknown respective marginals $H(dx)$ and $G(dy)$ and joint distribution $F(dx dy)$, based on {\it…
In this paper, we consider sequential testing over a single-sensor, a single-decision center setup. At each time instant $t$, the sensor gets $k$ samples $(k>0)$ and describes the observed sequence until time $t$ to the decision center over…
In this paper, we consider testing the homogeneity for proportions in independent binomial distributions especially when data are sparse for large number of groups. We provide broad aspects of our proposed tests such as theoretical studies,…
Compositional data (i.e., data comprising random variables that sum up to a constant) arises in many applications including microbiome studies, chemical ecology, political science, and experimental designs. Yet when compositional data serve…
Data depth has been applied as a nonparametric measurement for ranking multivariate samples. In this paper, we focus on homogeneity tests to assess whether two multivariate samples are from the same distribution. There are many data…
We introduce a general non-parametric independence test between right-censored survival times and covariates, which may be multivariate. Our test statistic has a dual interpretation, first in terms of the supremum of a potentially infinite…
In this article, we study the test for independence of two random elements $X$ and $Y$ lying in an infinite dimensional space ${\cal{H}}$ (specifically, a real separable Hilbert space equipped with the inner product $\langle .,…
Nonparametric two-sample testing is a classical problem in inferential statistics. While modern two-sample tests, such as the edge count test and its variants, can handle multivariate and non-Euclidean data, contemporary gargantuan datasets…
Intuitively, if a density operator has small rank, then it should be easier to estimate from experimental data, since in this case only a few eigenvectors need to be learned. We prove two complementary results that confirm this intuition.…
In this paper we propose several variants to perform the independence test between two random elements based on recurrence rates. We will show how to calculate the test statistic in each one of these cases. From simulations we obtain that…
Compression of integer sets and sequences has been extensively studied for settings where elements follow a uniform probability distribution. In addition, methods exist that exploit clustering of elements in order to achieve higher…
Traditionally, data compression deals with the problem of concisely representing a data source, e.g. a sequence of letters, for the purpose of eventual reproduction (either exact or approximate). In this work we are interested in the case…
Tensor decomposition on big data has attracted significant attention recently. Among the most popular methods is a class of algorithms that leverages compression in order to reduce the size of the tensor and potentially parallelize…
A framework is developed using techniques from rate distortion theory in statistical testing. The idea is first to do optimal compression according to a certain distortion function and then use information divergence from the compressed…
Rank correlations have found many innovative applications in the last decade. In particular, suitable rank correlations have been used for consistent tests of independence between pairs of random variables. Using ranks is especially…
We investigate one/two-sample mean tests for high-dimensional compositional data when the number of variables is comparable with the sample size, as commonly encountered in microbiome research. Existing methods mainly focus on max-type test…
Compression aims to reduce the size of an input, while maintaining its relevant properties. For multi-parameter persistent homology, compression is a necessary step in any computational pipeline, since standard constructions lead to large…