Related papers: P-values for high-dimensional regression

Two-Sample Testing in High-Dimensional Models

We propose novel methodology for testing equality of model parameters between two high-dimensional populations. The technique is very general and applicable to a wide range of models. The method is based on sample splitting: the data is…

Methodology · Statistics 2013-01-17 Nicolas Städler , Sach Mukherjee

Selective inference is easier with p-values

Selective inference is a subfield of statistics that enables valid inference after selection of a data-dependent question. In this paper, we introduce selectively dominant p-values, a class of p-values that allow practitioners to easily…

Methodology · Statistics 2024-11-22 Anav Sood

Random Partitioning and Distribution-based Thresholding for Iterative Variable Screening in High Dimensions

In big data analysis, a simple task such as linear regression can become very challenging as the variable dimension $p$ grows. As a result, variable screening is inevitable in many scientific studies. In recent years, randomized algorithms…

Methodology · Statistics 2019-02-13 Yu-Hsiang Cheng , Tzee-Ming Huang , Su-Yun Huang

Randomized p-values for multiple testing and their application in replicability analysis

We are concerned with testing replicability hypotheses for many endpoints simultaneously. This constitutes a multiple test problem with composite null hypotheses. Traditional $p$-values, which are computed under least favourable parameter…

Methodology · Statistics 2020-02-26 Anh-Tuan Hoang , Thorsten Dickhaus

Rank-transformed subsampling: inference for multiple data splitting and exchangeable p-values

Many testing problems are readily amenable to randomised tests such as those employing data splitting. However despite their usefulness in principle, randomised tests have obvious drawbacks. Firstly, two analyses of the same dataset may…

Methodology · Statistics 2024-09-05 F. Richard Guo , Rajen D. Shah

Second-generation p-values: improved rigor, reproducibility, & transparency in statistical analyses

Verifying that a statistically significant result is scientifically meaningful is not only good scientific practice, it is a natural way to control the Type I error rate. Here we introduce a novel extension of the p-value - a…

Methodology · Statistics 2018-07-04 Jeffrey D. Blume , Lucy DAgostino McGowan , William D. Dupont , Robert A. Greevy

Boosted p-Values for High-Dimensional Vector Autoregression

Assessing the statistical significance of parameter estimates is an important step in high-dimensional vector autoregression modeling. Using the least-squares boosting method, we compute the p-value for each selected parameter at every…

Econometrics · Economics 2023-03-16 Xiao Huang

Statistical significance in high-dimensional linear models

We propose a method for constructing p-values for general hypotheses in a high-dimensional linear model. The hypotheses can be local for testing a single regression parameter or they may be more global involving several up to all…

Methodology · Statistics 2013-10-14 Peter Bühlmann

Multiple testing of composite null hypotheses for discrete data using randomized $p$-values

$P$-values that are derived from continuously distributed test statistics are typically uniformly distributed on $(0,1)$ under least favorable parameter configurations (LFCs) in the null hypothesis. Conservativeness of a $p$-value $P$…

Methodology · Statistics 2023-03-13 Daniel Ochieng , Anh-Tuan Hoang , Thorsten Dickhaus

Minimally Discrete and Minimally Randomized p-Values

In meta analysis, multiple hypothesis testing and many other methods, p-values are utilized as inputs and assumed to be uniformly distributed over the unit interval under the null hypotheses. If data used to generate p-values have discrete…

Methodology · Statistics 2026-02-24 Joshua Habiger , Pratyaydipta Rudra

P-values: misunderstood and misused

P-values are widely used in both the social and natural sciences to quantify the statistical significance of observed results. The recent surge of big data research has made the p-value an even more popular tool to test the significance of…

Applications · Statistics 2023-01-05 Bertie Vidgen , Taha Yasseri

Distributions associated with simultaneous multiple hypothesis testing

We develop the distribution of the number of hypotheses found to be statistically significant using the rule from Benjamini and Hochberg (1995) for controlling the false discovery rate (FDR). This distribution has both a small sample form…

Methodology · Statistics 2018-02-27 Chang Yu , Daniel Zelterman

On p-value combination of independent and frequent signals: asymptotic efficiency and Fisher ensemble

Combining p-values to integrate multiple effects is of long-standing interest in social science and biomedical research. In this paper, we focus on revisiting a classical scenario closely related to meta-analysis, which combines a…

Methodology · Statistics 2022-04-15 Yusi Fang , Chung Chang , George Tseng

False Discovery Rate Control via Data Splitting

Selecting relevant features associated with a given response variable is an important issue in many scientific fields. Quantifying quality and uncertainty of a selection result via false discovery rate (FDR) control has been of recent…

Methodology · Statistics 2020-12-17 Chenguang Dai , Buyu Lin , Xin Xing , Jun S. Liu

Randomization Inference with Sample Attrition

Randomization inference is a widely-used and appealing approach for analyzing treatment effects in randomized experiments, as it is finite-sample valid and does not require any distributional assumptions. However, naive application of…

Econometrics · Economics 2026-05-12 Xinran Li , Peizan Sheng , Zeyang Yu

Posterior predictive p-values and the convex order

Posterior predictive p-values are a common approach to Bayesian model-checking. This article analyses their frequency behaviour, that is, their distribution when the parameters and the data are drawn from the prior and the model…

Statistics Theory · Mathematics 2015-03-31 Patrick Rubin-Delanchy , Daniel John Lawson

Splitting strategies for post-selection inference

We consider the problem of providing valid inference for a selected parameter in a sparse regression setting. It is well known that classical regression tools can be unreliable in this context due to the bias generated in the selection…

Methodology · Statistics 2022-12-07 Daniel G. Rasines , G. Alastair Young

Modeling High-Dimensional Dependent Data in the Presence of Many Explanatory Variables and Weak Signals

This article considers a novel and widely applicable approach to modeling high-dimensional dependent data when a large number of explanatory variables are available and the signal-to-noise ratio is low. We postulate that a $p$-dimensional…

Methodology · Statistics 2024-12-09 Zhaoxing Gao , Ruey S. Tsay

On estimation of the noise variance in high-dimensional probabilistic principal component analysis

In this paper, we develop new statistical theory for probabilistic principal component analysis models in high dimensions. The focus is the estimation of the noise variance, which is an important and unresolved issue when the number of…

Statistics Theory · Mathematics 2014-06-23 Damien Passemier , Zhaoyuan Li , Jian-Feng Yao

Partially Bayes p-values for large scale inference

We seek to conduct statistical inference for a large collection of primary parameters, each with its own nuisance parameters. Our approach is partially Bayesian, in that we treat the primary parameters as fixed while we model the nuisance…

Methodology · Statistics 2025-12-10 Nikolaos Ignatiadis , Li Ma