Related papers: Generalized Data Thinning Using Sufficient Statist…

Thinning a Wishart Random Matrix

Recent work has explored data thinning, a generalization of sample splitting that involves decomposing a (possibly matrix-valued) random variable into independent components. In the special case of a $n \times p$ random matrix with…

Methodology · Statistics 2025-12-16 Ameer Dharamshi , Anna Neufeld , Lucy L. Gao , Daniela Witten , Jacob Bien

Data thinning for convolution-closed distributions

We propose data thinning, an approach for splitting an observation into two or more independent parts that sum to the original observation, and that follow the same distribution as the original observation, up to a (known) scaling of a…

Methodology · Statistics 2023-11-22 Anna Neufeld , Ameer Dharamshi , Lucy L. Gao , Daniela Witten

A Thinning Analogue of de Finetti's Theorem

We consider a notion of uniform thinning for a finite sequence of random variables $(X_1,...,X_n)$ obtained by removing one random variable, uniformly at random. If a triangular array of random variables $(X_{n,k} : n \in \mathbb{N}_+, 1…

Probability · Mathematics 2007-05-23 Shannon Starr

Decompounding Under General Mixing Distributions

This study focuses on statistical inference for compound models of the form $X=\xi_1+\ldots+\xi_N$, where $N$ is a random variable denoting the count of summands, which are independent and identically distributed (i.i.d.) random variables…

Statistics Theory · Mathematics 2025-07-22 Denis Belomestny , Ekaterina Morozova , Vladimir Panov

Data fission: splitting a single data point

Suppose we observe a random vector $X$ from some distribution $P$ in a known family with unknown parameters. We ask the following question: when is it possible to split $X$ into two parts $f(X)$ and $g(X)$ such that neither part is…

Methodology · Statistics 2023-12-12 James Leiner , Boyan Duan , Larry Wasserman , Aaditya Ramdas

Learning General Policies from Small Examples Without Supervision

Generalized planning is concerned with the computation of general policies that solve multiple instances of a planning domain all at once. It has been recently shown that these policies can be computed in two steps: first, a suitable…

Artificial Intelligence · Computer Science 2021-02-19 Guillem Francès , Blai Bonet , Hector Geffner

Decomposing Gaussians with Unknown Covariance

Common workflows in machine learning and statistics rely on the ability to partition the information in a data set into independent portions. Recent work has shown that this may be possible even when conventional sample splitting is not…

Methodology · Statistics 2025-12-16 Ameer Dharamshi , Anna Neufeld , Lucy L. Gao , Jacob Bien , Daniela Witten

One Step to Efficient Synthetic Data

A common approach to synthetic data is to sample from a fitted model. We show that under general assumptions, this approach results in a sample with inefficient estimators and whose joint distribution is inconsistent with the true…

Statistics Theory · Mathematics 2026-02-18 Jordan Awan , Zhanrui Cai

Variable Partitioning for Distributed Optimization

This paper is about how to partition decision variables while decomposing a large-scale optimization problem for the best performance of distributed solution methods. Solving a large-scale optimization problem sequen- tially can be…

Optimization and Control · Mathematics 2017-10-26 Yuchen Zheng , Ilbin Lee , Nicoleta Serban

General thinning characterizations of distributions and point processes

For general thinning procedures, its inverse operation, the condensing, is studied and a link to integration-by-parts formulas is established. This extends the recent results on that link for independent thinnings of point processes to…

Probability · Mathematics 2017-04-26 Mathias Rafler

Contrained Generalization For Data Anonymization - A Systematic Search Based Approach

Data generalization is a powerful technique for sanitizing multi-attribute data for publication. In a multidimensional model, a subset of attributes called the quasi-identifiers (QI) are used to define the space and a generalization scheme…

Databases · Computer Science 2021-08-12 Bijit Hore , Ravi Jammalamadaka , Sharad Mehrotra , Amedeo D'Ascanio

Objective Bayesian analysis for the generalized exponential distribution

In this paper, we consider objective Bayesian inference of the generalized exponential distribution using the independence Jeffreys prior and validate the propriety of the posterior distribution under a family of structured priors. We…

Methodology · Statistics 2023-09-26 Aojun Li , Keying Ye , Min Wang

Beyond consistent reconstructions: optimality and sharp bounds for generalized sampling, and application to the uniform resampling problem

Generalized sampling is a recently developed linear framework for sampling and reconstruction in separable Hilbert spaces. It allows one to recover any element in any finite-dimensional subspace given finitely many of its samples with…

Numerical Analysis · Mathematics 2013-01-15 Ben Adcock , Anders C. Hansen , Clarice Poon

Group Invariance and Computational Sufficiency

Statistical sufficiency formalizes the notion of data reduction. In the decision theoretic interpretation, once a model is chosen all inferences should be based on a sufficient statistic. However, suppose we start with a set of procedures…

Statistics Theory · Mathematics 2018-08-01 Vincent Q. Vu

A generalization of variable elimination for separable inverse problems beyond least squares

In linear inverse problems, we have data derived from a noisy linear transformation of some unknown parameters, and we wish to estimate these unknowns from the data. Separable inverse problems are a powerful generalization in which the…

Optimization and Control · Mathematics 2015-06-12 Paul Shearer , Anna C. Gilbert

Density Sharpening: Principles and Applications to Discrete Data Analysis

This article introduces a general statistical modeling principle called "Density Sharpening" and applies it to the analysis of discrete count data. The underlying foundation is based on a new theory of nonparametric approximation and…

Methodology · Statistics 2021-08-24 Subhadeep Mukhopadhyay

Sampling Conditionally on a Rare Event via Generalized Splitting

We propose and analyze a generalized splitting method to sample approximately from a distribution conditional on the occurrence of a rare event. This has important applications in a variety of contexts in operations research, engineering,…

Methodology · Statistics 2019-09-10 Zdravko I. Botev , Pierre L'Ecuyer

Supervised Quantile Normalisation

Quantile normalisation is a popular normalisation method for data subject to unwanted variations such as images, speech, or genomic data. It applies a monotonic transformation to the feature values of each sample to ensure that after…

Machine Learning · Statistics 2017-06-02 Marine Le Morvan , Jean-Philippe Vert

Low-Rank Thinning

The goal in thinning is to summarize a dataset using a small set of representative points. Remarkably, sub-Gaussian thinning algorithms like Kernel Halving and Compress can match the quality of uniform subsampling while substantially…

Machine Learning · Statistics 2026-03-03 Annabelle Michael Carrell , Albert Gong , Abhishek Shetty , Raaz Dwivedi , Lester Mackey

Deep Knockoffs

This paper introduces a machine for sampling approximate model-X knockoffs for arbitrary and unspecified data distributions using deep generative models. The main idea is to iteratively refine a knockoff sampling mechanism until a criterion…

Methodology · Statistics 2020-03-03 Yaniv Romano , Matteo Sesia , Emmanuel J. Candès