Related papers: Identifying important predictors in large data bas…

Measuring Discrimination to Boost Comparative Testing for Multiple Deep Learning Models

The boom of DL technology leads to massive DL models built and shared, which facilitates the acquisition and reuse of DL models. For a given task, we encounter multiple DL models available with the same functionality, which are considered…

Software Engineering · Computer Science 2021-03-10 Linghan Meng , Yanhui Li , Lin Chen , Zhi Wang , Di Wu , Yuming Zhou , Baowen Xu

Stepdown SLOPE for Controlled Feature Selection

Sorted L-One Penalized Estimation (SLOPE) has shown the nice theoretical property as well as empirical behavior recently on the false discovery rate (FDR) control of high-dimensional feature selection by adaptively imposing the…

Statistics Theory · Mathematics 2023-02-22 Jingxuan Liang , Hong Chen , Xuelin Zhang , Weifu Li , Xin Tang

Statistical estimation and testing via the sorted L1 norm

We introduce a novel method for sparse regression and variable selection, which is inspired by modern ideas in multiple testing. Imagine we have observations from the linear model y = X beta + z, then we suggest estimating the regression…

Methodology · Statistics 2013-10-30 Malgorzata Bogdan , Ewout van den Berg , Weijie Su , Emmanuel Candes

Model Selection Techniques -- An Overview

In the era of big data, analysts usually explore various statistical models or machine learning methods for observed data in order to facilitate scientific discoveries or gain predictive power. Whatever data and fitting procedures are…

Machine Learning · Statistics 2018-10-24 Jie Ding , Vahid Tarokh , Yuhong Yang

Comparison of Multi-response Prediction Methods

While data science is battling to extract information from the enormous explosion of data, many estimators and algorithms are being developed for better prediction. Researchers and data scientists often introduce new methods and evaluate…

Applications · Statistics 2019-05-22 Raju Rimal , Trygve Almøy , Solve Sæbø

Multiple Hypotheses Testing For Variable Selection

Many methods have been developed to estimate the set of relevant variables in a sparse linear model Y= XB+e where the dimension p of B can be much higher than the length n of Y. Here we propose two new methods based on multiple hypotheses…

Statistics Theory · Mathematics 2012-06-12 Florian Rohart

Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning

The correct use of model evaluation, model selection, and algorithm selection techniques is vital in academic machine learning research as well as in many industrial settings. This article reviews different techniques that can be used for…

Machine Learning · Computer Science 2020-11-12 Sebastian Raschka

Contextual Online False Discovery Rate Control

Multiple hypothesis testing, a situation when we wish to consider many hypotheses, is a core problem in statistical inference that arises in almost every scientific field. In this setting, controlling the false discovery rate (FDR), which…

Statistics Theory · Mathematics 2019-03-19 Shiyun Chen , Shiva Kasiviswanathan

False Discovery Control in Multiple Testing: A Brief Overview of Theories and Methodologies

As the volume and complexity of data continue to expand across various scientific disciplines, the need for robust methods to account for the multiplicity of comparisons has grown widespread. A popular measure of type 1 error rate in…

Methodology · Statistics 2024-11-19 Jianliang He , Bowen Gang , Luella Fu

Deep Neural Network Benchmarks for Selective Classification

With the increasing deployment of machine learning models in many socially sensitive tasks, there is a growing demand for reliable and trustworthy predictions. One way to accomplish these requirements is to allow a model to abstain from…

Machine Learning · Computer Science 2024-09-19 Andrea Pugnana , Lorenzo Perini , Jesse Davis , Salvatore Ruggieri

Robust variable selection for model-based learning in presence of adulteration

The problem of identifying the most discriminating features when performing supervised learning has been extensively investigated. In particular, several methods for variable selection in model-based classification have been proposed.…

Applications · Statistics 2020-12-16 Andrea Cappozzo , Francesca Greselin , Thomas Brendan Murphy

A new multiple testing method in the dependent case

The most popular multiple testing procedures are stepwise procedures based on $P$-values for individual test statistics. Included among these are the false discovery rate (FDR) controlling procedures of Benjamini--Hochberg [J. Roy. Statist.…

Statistics Theory · Mathematics 2009-06-18 Arthur Cohen , Harold B. Sackrowitz , Minya Xu

Variable selection for general index models via sliced inverse regression

Variable selection, also known as feature selection in machine learning, plays an important role in modeling high dimensional data and is key to data-driven scientific discoveries. We consider here the problem of detecting influential…

Methodology · Statistics 2014-09-24 Bo Jiang , Jun S. Liu

Model-agnostic Selective Labeling with Provable Statistical Guarantees

Obtaining high-quality labels for large datasets is expensive, requiring massive annotations from human experts. While AI models offer a cost-effective alternative by predicting labels, their label quality is compromised by the unavoidable…

Machine Learning · Computer Science 2026-02-17 Huipeng Huang , Wenbo Liao , Huajun Xi , Hao Zeng , Mengchen Zhao , Hongxin Wei

The Landmark Selection Method for Multiple Output Prediction

Conditional modeling x \to y is a central problem in machine learning. A substantial research effort is devoted to such modeling when x is high dimensional. We consider, instead, the case of a high dimensional y, where x is either low…

Machine Learning · Computer Science 2012-07-03 Krishnakumar Balasubramanian , Guy Lebanon

SPL-MLL: Selecting Predictable Landmarks for Multi-Label Learning

Although significant progress achieved, multi-label classification is still challenging due to the complexity of correlations among different labels. Furthermore, modeling the relationships between input and some (dull) classes further…

Computer Vision and Pattern Recognition · Computer Science 2020-08-18 Junbing Li , Changqing Zhang , Pengfei Zhu , Baoyuan Wu , Lei Chen , Qinghua Hu

Statistical Efficiency of Single- and Multi-step Models for Forecasting and Control

Compounding error, where small prediction mistakes accumulate over time, presents a major challenge in learning-based control. A common remedy is to train multi-step predictors directly instead of rolling out single-step models. However, it…

Systems and Control · Electrical Eng. & Systems 2026-03-25 Anne Somalwar , Bruce D. Lee , George J. Pappas , Nikolai Matni

Selecting Diverse Models for Scientific Insight

Model selection often aims to choose a single model, assuming that the form of the model is correct. However, there may be multiple possible underlying explanatory patterns in a set of predictors that could explain a response. Model…

Methodology · Statistics 2021-12-17 Laura J. Wendelberger , Brian J. Reich , Alyson G. Wilson

High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking

Penalized likelihood approaches are widely used for high-dimensional regression. Although many methods have been proposed and the associated theory is now well-developed, the relative efficacy of different approaches in finite-sample…

Methodology · Statistics 2020-01-29 Fan Wang , Sach Mukherjee , Sylvia Richardson , Steven M. Hill

Adaptive FDR control under independence and dependence

In the context of multiple hypotheses testing, the proportion $\pi_0$ of true null hypotheses in the pool of hypotheses to test often plays a crucial role, although it is generally unknown a priori. A testing procedure using an implicit or…

Statistics Theory · Mathematics 2009-02-17 Gilles Blanchard , Etienne Roquain