Related papers: Unifying Boxplots: A Multiple Testing Perspective

When Tukey meets Chauvenet: a new boxplot criterion for outlier detection

The box-and-whisker plot, introduced by Tukey (1977), is one of the most popular graphical methods in descriptive statistics. On the other hand, however, Tukey's boxplot is free of sample size, yielding the so-called "one-size-fits-all"…

Methodology · Statistics 2025-06-10 Hongmei Lin , Riquan Zhang , Tiejun Tong

ChauBoxplot and AdaptiveBoxplot: Two R packages for boxplot-based outlier detection

Tukey's boxplot is widely used for outlier detection; however, its classic fixed-fence rule tends to flag an excessive number of outliers as the sample size grows. To address this, we introduce two new R packages, ChauBoxplot and…

Methodology · Statistics 2026-03-04 Tiejun Tong , Hongmei Lin , Bowen Gang , Riquan Zhang

The Bag-and-Whisker Plot: A New Bagplot for Bivariate Data

The bagplot, also known as the "bag-and-bolster plot", is a notable extension of the boxplot from univariate to bivariate data. Although widely used, its practical application is hindered by two key limitations: the fixed inflation factor…

Methodology · Statistics 2025-12-09 Shenghao Qin , Bowen Gang , Tiejun Tong , Hengjian Cui

ggskewboxplots: Enhanced Boxplots for Skewed Data in R

Traditional boxplots are widely used for summarizing and visualizing the distribution of numerical data, yet they exhibit significant limitations when applied to skewed or heavy-tailed distributions, often leading to misclassification of…

Methodology · Statistics 2025-11-24 Mustafa Cavus

Outlier detection and a tail-adjusted boxplot based on extreme value theory

Whether an extreme observation is an outlier or not, depends strongly on the corresponding tail behaviour of the underlying distribution. We develop an automatic, data-driven method to identify extreme tail behaviour that deviates from the…

Methodology · Statistics 2019-12-06 Shrijita Bhattacharya , Jan Beirlant

High-dimensional outlier detection using random projections

There exist multiple methods to detect outliers in multivariate data in the literature, but most of them require to estimate the covariance matrix. The higher the dimension, the more complex the estimation of the matrix becoming impossible…

Methodology · Statistics 2020-12-01 P. Navarro-Esteban , J. A. Cuesta-Albertos

Graphical approaches for the control of generalised error rates

When simultaneously testing multiple hypotheses, the usual approach in the context of confirmatory clinical trials is to control the familywise error rate (FWER), which bounds the probability of making at least one false rejection. In many…

Methodology · Statistics 2021-05-20 David S. Robertson , James M. S. Wason , Frank Bretz

False discovery rate envelopes

False discovery rate (FDR) is a common way to control the number of false discoveries in multiple testing. There are a number of approaches available for controlling FDR. However, for functional test statistics, which are discretized into…

Methodology · Statistics 2024-12-03 Tomáš Mrkvička , Mari Myllymäki

Statistical Depth based Normalization and Outlier Detection of Gene Expression Data

Normalization and outlier detection belong to the preprocessing of gene expression data. We propose a natural normalization procedure based on statistical data depth which normalizes to the distribution of gene expressions of the most…

Methodology · Statistics 2022-06-29 Alicia Nieto-Reyes , Javier Cabrera

Detecting and Classifying Outliers in Big Functional Data

We propose two new outlier detection methods, for identifying and classifying different types of outliers in (big) functional data sets. The proposed methods are based on an existing method called Massive Unsupervised Outlier Detection…

Methodology · Statistics 2021-10-15 Oluwasegun Taiwo Ojo , Antonio Fernández Anta , Rosa E. Lillo , Carlo Sguera

A practical way to regularize unfolding of sharply varying spectra with low data statistics

Unfolding is a well-established tool in particle physics. However, a naive application of the standard regularization techniques to unfold the momentum spectrum of protons ejected in the process of negative muon nuclear capture led to a…

Data Analysis, Statistics and Probability · Physics 2020-03-18 Andrei Gaponenko

Type I error rate control for testing many hypotheses: a survey with proofs

This paper presents a survey on some recent advances for the type I error rate control in multiple testing methodology. We consider the problem of controlling the $k$-family-wise error rate (kFWER, probability to make $k$ false discoveries…

Methodology · Statistics 2011-03-15 Etienne Roquain

Decision Theory For Large Scale Outlier Detection Using Aleatoric Uncertainty: With a Note on Bayesian FDR

Aleatoric and Epistemic uncertainty have achieved recent attention in the literature as different sources from which uncertainty can emerge in stochastic modeling. Epistemic being intrinsic or model based notions of uncertainty, and…

Methodology · Statistics 2025-08-15 Ryan Warnick

Trusted Uncertainty in Large Language Models: A Unified Framework for Confidence Calibration and Risk-Controlled Refusal

Deployed language models must decide not only what to answer but also when not to answer. We present UniCR, a unified framework that turns heterogeneous uncertainty evidence including sequence likelihoods, self-consistency dispersion,…

Computation and Language · Computer Science 2025-12-30 Markus Oehri , Giulia Conti , Kaviraj Pather , Alexandre Rossi , Laia Serra , Adrian Parody , Rogvi Johannesen , Aviaja Petersen , Arben Krasniqi

False Discovery Control in Multiple Testing: A Brief Overview of Theories and Methodologies

As the volume and complexity of data continue to expand across various scientific disciplines, the need for robust methods to account for the multiplicity of comparisons has grown widespread. A popular measure of type 1 error rate in…

Methodology · Statistics 2024-11-19 Jianliang He , Bowen Gang , Luella Fu

Principles of Conditionality and Layering of Error Rates with Application to Platform Trials

There has been a misconception that only one type of error rate control is necessary in clinical trials, leading to debates over whether to prioritize Familywise Error Rate (FWER) or False Discovery Rate (FDR). This misconception has led to…

Methodology · Statistics 2026-03-26 Xinping Cui , Emily Ouyang , Yi Liu , Jingjing Yan Schneider , Hong Tian , Bushi Wang , Jason C. Hsu

Regularized Halfspace Depth for Functional Data

Data depth is a powerful nonparametric tool originally proposed to rank multivariate data from center outward. In this context, one of the most archetypical depth notions is Tukey's halfspace depth. In the last few decades notions of depth…

Methodology · Statistics 2024-05-27 Hyemin Yeon , Xiongtao Dai , Sara Lopez-Pintado

A Robust AUC Maximization Framework with Simultaneous Outlier Detection and Feature Selection for Positive-Unlabeled Classification

The positive-unlabeled (PU) classification is a common scenario in real-world applications such as healthcare, text classification, and bioinformatics, in which we only observe a few samples labeled as "positive" together with a large…

Machine Learning · Computer Science 2018-03-20 Ke Ren , Haichuan Yang , Yu Zhao , Mingshan Xue , Hongyu Miao , Shuai Huang , Ji Liu

Functional Outlier Detection and Taxonomy by Sequential Transformations

Functional data analysis can be seriously impaired by abnormal observations, which can be classified as either magnitude or shape outliers based on their way of deviating from the bulk of data. Identifying magnitude outliers is relatively…

Methodology · Statistics 2020-03-24 Wenlin Dai , Tomas Mrkvicka , Ying Sun , Marc G. Genton

Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control

We introduce a framework for calibrating machine learning models so that their predictions satisfy explicit, finite-sample statistical guarantees. Our calibration algorithms work with any underlying model and (unknown) data-generating…

Machine Learning · Computer Science 2022-10-03 Anastasios N. Angelopoulos , Stephen Bates , Emmanuel J. Candès , Michael I. Jordan , Lihua Lei