Related papers: Design based incomplete U-statistics

Randomized incomplete $U$-statistics in high dimensions

This paper studies inference for the mean vector of a high-dimensional $U$-statistic. In the era of Big Data, the dimension $d$ of the $U$-statistic and the sample size $n$ of the observations tend to be both large, and the computation of…

Statistics Theory · Mathematics 2019-01-29 Xiaohui Chen , Kengo Kato

Exponential finite sample bounds for incomplete U-statistics

Incomplete U-statistics have been proposed to accelerate computation. They use only a subset of the subsamples required for kernel evaluations by complete U-statistics. This paper gives a finite sample bound in the style of Bernstein's…

Statistics Theory · Mathematics 2022-07-08 Andreas Maurer

Incomplete U-Statistics of Equireplicate Designs: Berry-Esseen Bound and Efficient Construction

U-statistics are a fundamental class of estimators that generalize the sample mean and underpin much of nonparametric statistics. Although extensively studied in both statistics and probability, key challenges remain: their high…

Statistics Theory · Mathematics 2026-02-19 Cesare Miglioli , Jordan Awan

On computing and the complexity of computing higher-order $U$-statistics, exactly

Higher-order $U$-statistics abound in fields such as statistics, machine learning, and computer science, but are known to be highly time-consuming to compute in practice. Despite their widespread appearance, a comprehensive study of their…

Machine Learning · Statistics 2026-04-01 Xingyu Chen , Ruiqi Zhang , Lin Liu

Dimension-agnostic inference using cross U-statistics

Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards…

Statistics Theory · Mathematics 2024-05-14 Ilmun Kim , Aaditya Ramdas

U-statistical inference for hierarchical clustering

Clustering methods are a valuable tool for the identification of patterns in high dimensional data with applications in many scientific problems. However, quantifying uncertainty in clustering is a challenging problem, particularly when…

Methodology · Statistics 2018-06-01 Marcio Valk , Gabriela Bettella Cybis

Beyond Basic A/B testing: Improving Statistical Efficiency for Business Growth

The standard A/B testing approaches are mostly based on t-test in large scale industry applications. These standard approaches however suffers from low statistical power in business settings, due to nature of small sample-size or…

Methodology · Statistics 2025-12-30 Changshuai Wei , Phuc Nguyen , Benjamin Zelditch , Joyce Chen

Semi-Supervised U-statistics

Semi-supervised datasets are ubiquitous across diverse domains where obtaining fully labeled data is costly or time-consuming. The prevalence of such datasets has consistently driven the demand for new tools and methods that exploit the…

Statistics Theory · Mathematics 2024-03-12 Ilmun Kim , Larry Wasserman , Sivaraman Balakrishnan , Matey Neykov

Scaling-up Empirical Risk Minimization: Optimization of Incomplete U-statistics

In a wide range of statistical learning problems such as ranking, clustering or metric learning among others, the risk is accurately estimated by $U$-statistics of degree $d\geq 1$, i.e. functionals of the training data with low variance…

Machine Learning · Statistics 2019-01-25 Stéphan Clémençon , Aurélien Bellet , Igor Colin

Maximum Mean Discrepancy with Unequal Sample Sizes via Generalized U-Statistics

Existing two-sample testing techniques, particularly those based on choosing a kernel for the Maximum Mean Discrepancy (MMD), often assume equal sample sizes from the two distributions. Applying these methods in practice can require…

Machine Learning · Statistics 2025-12-17 Aaron Wei , Milad Jalali , Danica J. Sutherland

U-Statistic Reduction: Higher-Order Accurate Risk Control and Statistical-Computational Trade-Off, with Application to Network Method-of-Moments

U-statistics play central roles in many statistical learning tools but face the haunting issue of scalability. Significant efforts have been devoted into accelerating computation by U-statistic reduction. However, existing results almost…

Methodology · Statistics 2023-06-07 Meijia Shao , Dong Xia , Yuan Zhang

U-Statistics for Importance-Weighted Variational Inference

We propose the use of U-statistics to reduce variance for gradient estimation in importance-weighted variational inference. The key observation is that, given a base gradient estimator that requires $m > 1$ samples and a total of $n > m$…

Machine Learning · Computer Science 2023-02-28 Javier Burroni , Kenta Takatsu , Justin Domke , Daniel Sheldon

Asymptotics of Randomly Weighted u- and v-statistics: Application to Bootstrap

This paper is mainly concerned with asymptotic studies of weighted bootstrap for u- and v-statistics. We derive the consistency of the weighted bootstrap u- and v-statistics, based on i.i.d. and non i.i.d. observations, from some more…

Statistics Theory · Mathematics 2012-10-23 Miklos Csorgo , Masoud M. Nasari

On the EM-Tau algorithm: a new EM-style algorithm with partial E-steps

The EM algorithm is one of many important tools in the field of statistics. While often used for imputing missing data, its widespread applications include other common statistical tasks, such as clustering. In clustering, the EM algorithm…

Machine Learning · Statistics 2017-11-22 Val Andrei Fajardo , Jiaxi Liang

On the Subbagging Estimation for Massive Data

This article introduces subbagging (subsample aggregating) estimation approaches for big data analysis with memory constraints of computers. Specifically, for the whole dataset with size $N$, $m_N$ subsamples are randomly drawn, and each…

Methodology · Statistics 2021-03-05 Tao Zou , Xian Li , Xuan Liang , Hansheng Wang

Learning U-Statistics with Active Inference

$U$-statistics play a central role in statistical inference. In many modern applications, however, acquiring the labels required for $U$-statistics is costly. Motivated by recent advances in active inference, we develop an active inference…

Machine Learning · Statistics 2026-05-13 Xiaoning Wang , Yuyang Huo , Liuhua Peng , Changliang Zou

Bootstrap for U-Statistics: A new approach

Bootstrap for nonlinear statistics like U-statistics of dependent data has been studied by several authors. This is typically done by producing a bootstrap version of the sample and plugging it into the statistic. We suggest an alternative…

Statistics Theory · Mathematics 2015-05-28 Olimjon Sh. Sharipov , Johannes Tewes , Martin Wendler

Quantum U-statistics

The notion of a $U$-statistic for an $n$-tuple of identical quantum systems is introduced in analogy to the classical (commutative) case: given a selfadjoint `kernel' $K$ acting on $(\mathbb{C}^{d})^{\otimes r}$ with $r<n$, we define the…

Quantum Physics · Physics 2011-06-23 Madalin Guta , Cristina Butucea

Higher Order Estimating Equations for High-dimensional Models

We introduce a new method of estimation of parameters in semiparametric and nonparametric models. The method is based on estimating equations that are $U$-statistics in the observations. The $U$-statistics are based on higher order…

Methodology · Statistics 2023-07-14 James Robins , Lingling Li , Rajarshi Mukherjee , Eric Tchetgen Tchetgen , Aad van der Vaart

Approximating high-dimensional infinite-order $U$-statistics: statistical and computational guarantees

We study the problem of distributional approximations to high-dimensional non-degenerate $U$-statistics with random kernels of diverging orders. Infinite-order $U$-statistics (IOUS) are a useful tool for constructing simultaneous prediction…

Statistics Theory · Mathematics 2019-12-11 Yanglei Song , Xiaohui Chen , Kengo Kato