English
Related papers

Related papers: A Semi-Supervised Kernel Two-Sample Test

200 papers

Kernel methods provide a flexible and powerful framework for nonparametric statistical testing by embedding probability distributions into a reproducing kernel Hilbert space (RKHS). In this work, we study the kernel two-sample testing…

Statistics Theory · Mathematics 2026-04-09 Perrine Lacroix , Bertrand Michel , Franck Picard , Vincent Rivoirard

We propose a two-sample testing procedure based on learned deep neural network representations. To this end, we define two test statistics that perform an asymptotic location test on data samples mapped onto a hidden layer. The tests are…

Machine Learning · Statistics 2020-03-11 Matthias Kirchler , Shahryar Khorasani , Marius Kloft , Christoph Lippert

This paper is concerned with testing normality in a Hilbert space based on the maximum mean discrepancy. Specifically, we discuss the behavior of the test from two standpoints: asymptotics and practical aspects. Asymptotic normality of the…

Statistics Theory · Mathematics 2019-02-12 Natsumi Makigusa , Kanta Naito

Two-sample tests for multivariate data and especially for non-Euclidean data are not well explored. This paper presents a novel test statistic based on a similarity graph constructed on the pooled observations from the two samples. It can…

Methodology · Statistics 2024-08-12 Hao Chen , Jerome H. Friedman

Kernel two-sample tests have been widely used for multivariate data to test equality of distributions. However, existing tests based on mapping distributions into a reproducing kernel Hilbert space mainly target specific alternatives and do…

Methodology · Statistics 2023-11-21 Hoseung Song , Hao Chen

The two-sample hypothesis testing problem is studied for the challenging scenario of high dimensional data sets with small sample sizes. We show that the two-sample hypothesis testing problem can be posed as a one-class set classification…

Machine Learning · Statistics 2017-11-15 Hamed Masnadi-Shirazi

Semi-supervised datasets are ubiquitous across diverse domains where obtaining fully labeled data is costly or time-consuming. The prevalence of such datasets has consistently driven the demand for new tools and methods that exploit the…

Statistics Theory · Mathematics 2024-03-12 Ilmun Kim , Larry Wasserman , Sivaraman Balakrishnan , Matey Neykov

In modern data analysis, nonparametric measures of discrepancies between random variables are particularly important. The subject is well-studied in the frequentist literature, while the development in the Bayesian setting is limited where…

Methodology · Statistics 2022-01-25 Qinyi Zhang , Veit Wild , Sarah Filippi , Seth Flaxman , Dino Sejdinovic

We propose a general semi-supervised inference framework focused on the estimation of the population mean. As usual in semi-supervised settings, there exists an unlabeled sample of covariate vectors and a labeled sample consisting of…

Methodology · Statistics 2018-08-15 Anru Zhang , Lawrence D. Brown , T. Tony Cai

Kernel two-sample tests have been widely used, and the development of efficient methods for high-dimensional, large-scale data is receiving increasing attention in the big data era. However, existing methods, such as the maximum mean…

Methodology · Statistics 2025-10-03 Hoseung Song , Hao Chen

This study investigates treatment effect estimation in the semi-supervised setting, also can be interpreted as prediction-powered inference. In our setting, we can use not only the standard triple of covariates, treatment indicator, and…

Machine Learning · Statistics 2026-05-05 Masahiro Kato

Two-sample tests evaluate whether two samples are realizations of the same distribution (the null hypothesis) or two different distributions (the alternative hypothesis). We consider a new setting for this problem where sample features are…

Machine Learning · Computer Science 2022-07-20 Weizhi Li , Gautam Dasarathy , Karthikeyan Natesan Ramamurthy , Visar Berisha

To adapt kernel two-sample and independence testing to complex structured data, aggregation of multiple kernels is frequently employed to boost testing power compared to single-kernel tests. However, we observe a phenomenon that directly…

Machine Learning · Computer Science 2025-10-14 Zhijian Zhou , Xunye Tian , Liuhua Peng , Chao Lei , Antonin Schrab , Danica J. Sutherland , Feng Liu

Kernel-based tests provide a simple yet effective framework that use the theory of reproducing kernel Hilbert spaces to design non-parametric testing procedures. In this paper we propose new theoretical tools that can be used to study the…

Statistics Theory · Mathematics 2022-09-02 Tamara Fernández , Nicolás Rivera

Nonparametric two sample testing deals with the question of consistently deciding if two distributions are different, given samples from both, without making any parametric assumptions about the form of the distributions. The current…

Statistics Theory · Mathematics 2014-11-25 Aaditya Ramdas , Sashank J. Reddi , Barnabas Poczos , Aarti Singh , Larry Wasserman

In this paper, we address the problem of two-sample testing in the presence of missing data under a variety of missingness mechanisms. Our focus is on the well-known energy distance-based two-sample test. In addition to the standard…

Methodology · Statistics 2025-08-18 Danijel G. Aleksić , Bojana Milošević

The available data in semi-supervised learning usually consists of relatively small sized labeled data and much larger sized unlabeled data. How to effectively exploit unlabeled data is the key issue. In this paper, we write the regression…

Methodology · Statistics 2024-11-13 Ziwen Gao , Huihang Liu , Xinyu Zhang

We propose a novel kernel-based nonparametric two-sample test, employing the combined use of kernel mean and kernel covariance embedding. Our test builds on recent results showing how such combined embeddings map distinct probability…

Machine Learning · Statistics 2025-09-16 Leonardo V. Santoro , Victor M. Panaretos

We study the problems of sequential nonparametric two-sample and independence testing. Sequential tests process data online and allow using observed data to decide whether to stop and reject the null hypothesis or to collect more data,…

Machine Learning · Statistics 2023-07-21 Aleksandr Podkopaev , Aaditya Ramdas

Data depth has been applied as a nonparametric measurement for ranking multivariate samples. In this paper, we focus on homogeneity tests to assess whether two multivariate samples are from the same distribution. There are many data…

Statistics Theory · Mathematics 2023-06-09 Yiting Chen , Wei Lin , Xiaoping Shi
‹ Prev 1 2 3 10 Next ›