Related papers: Generalized Data Thinning Using Sufficient Statist…
Recent work has explored data thinning, a generalization of sample splitting that involves decomposing a (possibly matrix-valued) random variable into independent components. In the special case of a $n \times p$ random matrix with…
We propose data thinning, an approach for splitting an observation into two or more independent parts that sum to the original observation, and that follow the same distribution as the original observation, up to a (known) scaling of a…
We consider a notion of uniform thinning for a finite sequence of random variables $(X_1,...,X_n)$ obtained by removing one random variable, uniformly at random. If a triangular array of random variables $(X_{n,k} : n \in \mathbb{N}_+, 1…
This study focuses on statistical inference for compound models of the form $X=\xi_1+\ldots+\xi_N$, where $N$ is a random variable denoting the count of summands, which are independent and identically distributed (i.i.d.) random variables…
Suppose we observe a random vector $X$ from some distribution $P$ in a known family with unknown parameters. We ask the following question: when is it possible to split $X$ into two parts $f(X)$ and $g(X)$ such that neither part is…
Generalized planning is concerned with the computation of general policies that solve multiple instances of a planning domain all at once. It has been recently shown that these policies can be computed in two steps: first, a suitable…
Common workflows in machine learning and statistics rely on the ability to partition the information in a data set into independent portions. Recent work has shown that this may be possible even when conventional sample splitting is not…
A common approach to synthetic data is to sample from a fitted model. We show that under general assumptions, this approach results in a sample with inefficient estimators and whose joint distribution is inconsistent with the true…
This paper is about how to partition decision variables while decomposing a large-scale optimization problem for the best performance of distributed solution methods. Solving a large-scale optimization problem sequen- tially can be…
For general thinning procedures, its inverse operation, the condensing, is studied and a link to integration-by-parts formulas is established. This extends the recent results on that link for independent thinnings of point processes to…
Data generalization is a powerful technique for sanitizing multi-attribute data for publication. In a multidimensional model, a subset of attributes called the quasi-identifiers (QI) are used to define the space and a generalization scheme…
In this paper, we consider objective Bayesian inference of the generalized exponential distribution using the independence Jeffreys prior and validate the propriety of the posterior distribution under a family of structured priors. We…
Generalized sampling is a recently developed linear framework for sampling and reconstruction in separable Hilbert spaces. It allows one to recover any element in any finite-dimensional subspace given finitely many of its samples with…
Statistical sufficiency formalizes the notion of data reduction. In the decision theoretic interpretation, once a model is chosen all inferences should be based on a sufficient statistic. However, suppose we start with a set of procedures…
In linear inverse problems, we have data derived from a noisy linear transformation of some unknown parameters, and we wish to estimate these unknowns from the data. Separable inverse problems are a powerful generalization in which the…
This article introduces a general statistical modeling principle called "Density Sharpening" and applies it to the analysis of discrete count data. The underlying foundation is based on a new theory of nonparametric approximation and…
We propose and analyze a generalized splitting method to sample approximately from a distribution conditional on the occurrence of a rare event. This has important applications in a variety of contexts in operations research, engineering,…
Quantile normalisation is a popular normalisation method for data subject to unwanted variations such as images, speech, or genomic data. It applies a monotonic transformation to the feature values of each sample to ensure that after…
The goal in thinning is to summarize a dataset using a small set of representative points. Remarkably, sub-Gaussian thinning algorithms like Kernel Halving and Compress can match the quality of uniform subsampling while substantially…
This paper introduces a machine for sampling approximate model-X knockoffs for arbitrary and unspecified data distributions using deep generative models. The main idea is to iteratively refine a knockoff sampling mechanism until a criterion…