Related papers: Statistical Tests for Large Tree-structured Data

Testing statistical hypothesis on random trees and applications to the protein classification problem

Efficient automatic protein classification is of central importance in genomic annotation. As an independent way to check the reliability of the classification, we propose a statistical approach to test if two sets of protein domain…

Statistics Theory · Mathematics 2009-08-25 Jorge R. Busch , Pablo A. Ferrari , Ana Georgina Flesia , Ricardo Fraiman , Sebastian P. Grynberg , Florencia Leonardi

Tree-based methods for estimating heterogeneous model performance and model combining

Model performance is frequently reported only for the overall population under consideration. However, due to heterogeneity, overall performance measures often do not accurately represent model performance within specific subgroups. We…

Methodology · Statistics 2025-06-03 Ruotao Zhang , Constantine Gatsonis , Jon Steingrimsson

On goodness-of-fit tests for arbitrary multivariate models

Goodness-of-fit tests are often used in data analysis to test the agreement of a distribution to a set of data. These tests can be used to detect an unknown signal against a known background or to set limits on a proposed signal…

Methodology · Statistics 2023-03-20 Lolian Shtembari , Allen Caldwell

A Family of Exact Goodness-of-Fit Tests for High-Dimensional Discrete Distributions

The objective of goodness-of-fit testing is to assess whether a dataset of observations is likely to have been drawn from a candidate probability distribution. This paper presents a rank-based family of goodness-of-fit tests that is…

Statistics Theory · Mathematics 2019-04-18 Feras A. Saad , Cameron E. Freer , Nathanael L. Ackerman , Vikash K. Mansinghka

Testing for Typicality with Respect to an Ensemble of Learned Distributions

Methods of performing anomaly detection on high-dimensional data sets are needed, since algorithms which are trained on data are only expected to perform well on data that is similar to the training data. There are theoretical results on…

Machine Learning · Computer Science 2020-11-13 Forrest Laine , Claire Tomlin

A distance based test on random trees

In this paper, we address the question of comparison between populations of trees. We study an statistical test based on the distance between empirical mean trees, as an analog of the two sample z statistic for comparing two means. Despite…

Statistics Theory · Mathematics 2007-08-14 Ana Georgina Flesia , Ricardo Fraiman

Goodness-of-fit testing in high-dimensional generalized linear models

We propose a family of tests to assess the goodness-of-fit of a high-dimensional generalized linear model. Our framework is flexible and may be used to construct an omnibus test or directed against testing specific non-linearities and…

Methodology · Statistics 2019-11-14 Jana Janková , Rajen D. Shah , Peter Bühlmann , Richard J. Samworth

Composite goodness-of-fit test with the Kernel Stein Discrepancy and a bootstrap for degenerate U-statistics with estimated parameters

This paper formally derives the asymptotic distribution of a goodness-of-fit test based on the Kernel Stein Discrepancy introduced in (Oscar Key et al., "Composite Goodness-of-fit Tests with Kernels", Journal of Machine Learning Research…

Statistics Theory · Mathematics 2026-02-24 Florian Brück , Veronika Reimoser , Fabian Baier

Weighted Sum-of-Trees Model for Clustered Data

Clustered data, which arise when observations are nested within groups, are incredibly common in clinical, education, and social science research. Traditionally, a linear mixed model, which includes random effects to account for…

Methodology · Statistics 2026-02-04 Kevin McCoy , Zachary Wooten , Katarzyna Tomczak , Christine B. Peterson

A Kernel Test of Goodness of Fit

We propose a nonparametric statistical test for goodness-of-fit: given a set of samples, the test determines how likely it is that these were generated from a target density function. The measure of goodness-of-fit is a divergence…

Machine Learning · Statistics 2016-09-28 Kacper Chwialkowski , Heiko Strathmann , Arthur Gretton

Asymptotic Properties of High-Dimensional Random Forests

As a flexible nonparametric learning tool, the random forests algorithm has been widely applied to various real applications with appealing empirical performance, even in the presence of high-dimensional feature space. Unveiling the…

Statistics Theory · Mathematics 2022-09-27 Chien-Ming Chi , Patrick Vossler , Yingying Fan , Jinchi Lv

Goodness-of-fit tests for stochastic frontier models based on the characteristic function

We consider goodness-of-fit tests for the distribution of the composed error in Stochastic Frontier Models. The proposed test statistic utilizes the characteristic function of the composed error term, and is formulated as a weighted…

Statistics Theory · Mathematics 2022-03-01 Simos G. Meintanis , Christos K. Papadimitriou

Composite Goodness-of-fit Tests with Kernels

Model misspecification can create significant challenges for the implementation of probabilistic models, and this has led to development of a range of robust methods which directly account for this issue. However, whether these more…

Machine Learning · Statistics 2025-04-22 Oscar Key , Arthur Gretton , François-Xavier Briol , Tamara Fernandez

Data-Driven Tree Transforms and Metrics

We consider the analysis of high dimensional data given in the form of a matrix with columns consisting of observations and rows consisting of features. Often the data is such that the observations do not reside on a regular grid, and the…

Machine Learning · Statistics 2017-08-22 Gal Mishne , Ronen Talmon , Israel Cohen , Ronald R. Coifman , Yuval Kluger

Goodness-of-Fit Tests for Large Datasets

Nowadays, data analysis in the world of Big Data is connected typically to data mining, descriptive or exploratory statistics, e.~g.\ cluster analysis, classification or regression analysis. Aside these techniques there is a huge area of…

Applications · Statistics 2018-10-24 Taras Lazariv , Christoph Lehmann

Conditional Distribution Model Specification Testing Using Chi-Square Goodness-of-Fit Tests

This paper introduces chi-square goodness-of-fit tests to check for conditional distribution model specification. The data is cross-classified according to the Rosenblatt transform of the dependent variable and the explanatory variables,…

Econometrics · Economics 2023-09-25 Miguel A. Delgado , Julius Vainora

On Determining the Distribution of a Goodness-of-Fit Test Statistic

We consider the problem of goodness-of-fit testing for a model that has at least one unknown parameter that cannot be eliminated by transformation. Examples of such problems can be as simple as testing whether a sample consists of…

Methodology · Statistics 2021-04-28 Sean van der Merwe

A goodness-of-fit test for stochastic block models

The stochastic block model is a popular tool for studying community structures in network data. We develop a goodness-of-fit test for the stochastic block model. The test statistic is based on the largest singular value of a residual matrix…

Statistics Theory · Mathematics 2016-01-22 Jing Lei

A New Bootstrap Goodness-of-Fit Test for Normal Linear Regression Models

In this work, the distributional properties of the goodness-of-fit term in likelihood-based information criteria are explored. These properties are then leveraged to construct a novel goodness-of-fit test for normal linear regression models…

Methodology · Statistics 2023-09-20 Scott H. Koeneman , Joseph E. Cavanaugh

A new set of tools for goodness-of-fit validation

We introduce two new tools to assess the validity of statistical distributions. These tools are based on components derived from a new statistical quantity, the $comparison$ $curve$. The first tool is a graphical representation of these…

Methodology · Statistics 2024-05-16 Gilles R. Ducharme , Teresa Ledwina