Related papers: Bagging multiple comparisons from microarray data

Simultaneous inference: When should hypothesis testing problems be combined?

Modern statisticians are often presented with hundreds or thousands of hypothesis testing problems to evaluate at the same time, generated from new scientific technologies such as microarrays, medical and satellite imaging devices, or flow…

Applications · Statistics 2008-12-18 Bradley Efron

Higher-order accurate two-sample network inference and network hashing

Two-sample hypothesis testing for network comparison presents many significant challenges, including: leveraging repeated network observations and known node registration, but without requiring them to operate; relaxing strong structural…

Methodology · Statistics 2024-02-05 Meijia Shao , Dong Xia , Yuan Zhang , Qiong Wu , Shuo Chen

Magging: maximin aggregation for inhomogeneous large-scale data

Large-scale data analysis poses both statistical and computational problems which need to be addressed simultaneously. A solution is often straightforward if the data are homogeneous: one can use classical ideas of subsampling and mean…

Methodology · Statistics 2014-09-10 Peter Bühlmann , Nicolai Meinshausen

Significant Subgraph Mining with Multiple Testing Correction

The problem of finding itemsets that are statistically significantly enriched in a class of transactions is complicated by the need to correct for multiple hypothesis testing. Pruning untestable hypotheses was recently proposed as a…

Methodology · Statistics 2015-02-02 Mahito Sugiyama , Felipe Llinares López , Niklas Kasenburg , Karsten M. Borgwardt

Multiple Hypothesis Testing in Pattern Discovery

The problem of multiple hypothesis testing arises when there are more than one hypothesis to be tested simultaneously for statistical significance. This is a very common situation in many data mining applications. For instance, assessing…

Machine Learning · Statistics 2009-06-30 Sami Hanhijärvi , Kai Puolamäki , Gemma C. Garriga

Size, power and false discovery rates

Modern scientific technology has provided a new class of large-scale simultaneous inference problems, with thousands of hypothesis tests to consider at the same time. Microarrays epitomize this type of technology, but similar situations…

Statistics Theory · Mathematics 2007-11-06 Bradley Efron

Hypothesis Testing for Network Data with Power Enhancement

Comparing two population means of network data is of paramount importance in a wide range of scientific applications. Many existing network inference solutions focus on global testing of entire networks, without comparing individual network…

Methodology · Statistics 2019-10-10 Yin Xia , Lexin Li

Reducing Sampling Ratios Improves Bagging in Sparse Regression

Bagging, a powerful ensemble method from machine learning, improves the performance of unstable predictors. Although the power of Bagging has been shown mostly in classification problems, we demonstrate the success of employing Bagging in…

Machine Learning · Statistics 2019-05-03 Luoluo Liu , Sang Peter Chin , Trac D. Tran

Bagging and Boosting a Treebank Parser

Bagging and boosting, two effective machine learning techniques, are applied to natural language parsing. Experiments using these techniques with a trainable statistical parser are described. The best resulting system provides roughly as…

Computation and Language · Computer Science 2007-05-23 John C. Henderson , Eric Brill

Using prior information to boost power in correlation structure support recovery

Hypothesis testing of structure in correlation and covariance matrices is of broad interest in many application areas. In high dimensions and/or small to moderate sample sizes, high error rates in testing is a substantial concern. This…

Methodology · Statistics 2026-01-07 Ziyang Ding , David Dunson

Subbagging Variable Selection for Big Data

This article introduces a subbagging (subsample aggregating) approach for variable selection in regression within the context of big data. The proposed subbagging approach not only ensures that variable selection is scalable given the…

Methodology · Statistics 2025-03-10 Xian Li , Xuan Liang , Tao Zou

Finding Associations and Computing Similarity via Biased Pair Sampling

This version is ***superseded*** by a full version that can be found at http://www.itu.dk/people/pagh/papers/mining-jour.pdf, which contains stronger theoretical results and fixes a mistake in the reporting of experiments. Abstract:…

Data Structures and Algorithms · Computer Science 2010-02-17 Andrea Campagna , Rasmus Pagh

Simultaneous hypothesis testing for comparing many functional means

Data with multiple functional recordings at each observational unit are increasingly common in various fields including medical imaging and environmental sciences. To conduct inference for such observations, we develop a paired two-sample…

Methodology · Statistics 2025-06-16 Colin Decker , Dehan Kong , Stanislav Volgushev

A Bandit Approach to Multiple Testing with False Discovery Control

We propose an adaptive sampling approach for multiple testing which aims to maximize statistical power while ensuring anytime false discovery control. We consider $n$ distributions whose means are partitioned by whether they are below or…

Machine Learning · Statistics 2019-07-18 Kevin Jamieson , Lalit Jain

When does Subagging Work?

We study the effectiveness of subagging, or subsample aggregating, on regression trees, a popular non-parametric method in machine learning. First, we give sufficient conditions for pointwise consistency of trees. We formalize that (i) the…

Machine Learning · Statistics 2024-04-03 Christos Revelas , Otilia Boldea , Bas J. M. Werker

Less Data, Faster Training: repeating smaller datasets speeds up learning via sampling biases

This work investigates the ``small-vs-large gap'', where repeating on fewer samples can lead to compute saving during training compared to using a larger dataset. This is observed across algorithmic tasks, architectures and optimizers and…

Machine Learning · Computer Science 2026-05-21 Jingwen Liu , Ezra Edelman , Surbhi Goel , Bingbin Liu

Multiple testing for signal-agnostic searches of new physics with machine learning

In this work, we address the question of how to enhance signal-agnostic searches by leveraging multiple testing strategies. Specifically, we consider hypothesis tests relying on machine learning, where model selection can introduce a bias…

High Energy Physics - Phenomenology · Physics 2024-08-23 Gaia Grosso , Marco Letizia

A Coreset Learning Reality Check

Subsampling algorithms are a natural approach to reduce data size before fitting models on massive datasets. In recent years, several works have proposed methods for subsampling rows from a data matrix while maintaining relevant information…

Machine Learning · Computer Science 2023-01-18 Fred Lu , Edward Raff , James Holt

Simultaneous testing of hypotheses and alternatives

To identify statistically significant conclusions, it is proposed to simultaneously test hypotheses and alternatives. It is shown that, under the condition of free combination of hypotheses and alternatives, the closure method leads to…

Methodology · Statistics 2025-09-15 P. A. Koldanov , A. P. Koldanov

Combining support for hypotheses over heterogeneous studies with Bayesian Evidence Synthesis: A simulation study

Scientific claims gain credibility by replicability, especially if replication under different circumstances and varying designs yields equivalent results. Aggregating results over multiple studies is, however, not straightforward, and when…

Methodology · Statistics 2023-12-27 Thom Benjamin Volker , Irene Klugkist