Related papers: Quantile Based Variable Mining : Detection, FDR ba…

Model-free controlled variable selection via data splitting

Addressing the simultaneous identification of contributory variables while controlling the false discovery rate (FDR) in high-dimensional data is a crucial statistical challenge. In this paper, we propose a novel model-free variable…

Methodology · Statistics 2024-04-23 Yixin Han , Xu Guo , Changliang Zou

Quantile Factor Models

Quantile Factor Models (QFM) represent a new class of factor models for high-dimensional panel data. Unlike Approximate Factor Models (AFM), where only location-shifting factors can be extracted, QFM also allow to recover unobserved factors…

Econometrics · Economics 2020-09-24 Liang Chen , Juan Jose Dolado , Jesus Gonzalo

Consistent and Flexible Selectivity Estimation for High-Dimensional Data

Selectivity estimation aims at estimating the number of database objects that satisfy a selection criterion. Answering this problem accurately and efficiently is essential to many applications, such as density estimation, outlier detection,…

Databases · Computer Science 2021-05-28 Yaoshu Wang , Chuan Xiao , Jianbin Qin , Rui Mao , Onizuka Makoto , Wei Wang , Rui Zhang , Yoshiharu Ishikawa

A Feature Selection Method that Controls the False Discovery Rate

The problem of selecting a handful of truly relevant variables in supervised machine learning algorithms is a challenging problem in terms of untestable assumptions that must hold and unavailability of theoretical assurances that selection…

Methodology · Statistics 2023-11-10 Mehdi Rostami , Olli Saarela

Controlling the False Discovery Rate for Binary Feature Selection via Knockoff

Variable selection has been widely used in data analysis for the past decades, and it becomes increasingly important in the Big Data era as there are usually hundreds of variables available in a dataset. To enhance interpretability of a…

Methodology · Statistics 2020-08-17 Yuxiang Xie , Kwun Chuen Gary Chan

High-Dimensional False Discovery Rate Control for Dependent Variables

Algorithms that ensure reproducible findings from large-scale, high-dimensional data are pivotal in numerous signal processing applications. In recent years, multivariate false discovery rate (FDR) controlling methods have emerged,…

Methodology · Statistics 2024-01-31 Jasin Machkour , Michael Muma , Daniel P. Palomar

False Discovery Rate Control via Data Splitting

Selecting relevant features associated with a given response variable is an important issue in many scientific fields. Quantifying quality and uncertainty of a selection result via false discovery rate (FDR) control has been of recent…

Methodology · Statistics 2020-12-17 Chenguang Dai , Buyu Lin , Xin Xing , Jun S. Liu

The quantile-based classifier with variable-wise parameters

Quantile-based classifiers can classify high-dimensional observations by minimising a discrepancy of an observation to a class based on suitable quantiles of the within-class distributions, corresponding to a unique percentage for all…

Methodology · Statistics 2024-04-23 Marco Berrettini , Christian Hennig , Cinzia Viroli

A Novel Feature Selection and Extraction Technique for Classification

This paper presents a versatile technique for the purpose of feature selection and extraction - Class Dependent Features (CDFs). We use CDFs to improve the accuracy of classification and at the same time control computational expense by…

Machine Learning · Computer Science 2014-12-30 Kratarth Goel , Raunaq Vohra , Ainesh Bakshi

Statistical Challenges with High Dimensionality: Feature Selection in Knowledge Discovery

Technological innovations have revolutionized the process of scientific research and knowledge discovery. The availability of massive data and challenges from frontiers of research and development have reshaped statistical thinking, data…

Statistics Theory · Mathematics 2007-06-13 Jianqing Fan , Runze Li

False Variable Selection Rates in Regression

There has been recent interest in extending the ideas of False Discovery Rates (FDR) to variable selection in regression settings. Traditionally the FDR in these settings has been defined in terms of the coefficients of the full regression…

Methodology · Statistics 2013-02-12 Max Grazier G'Sell , Trevor Hastie , Robert Tibshirani

MMD-based Variable Importance for Distributional Random Forest

Distributional Random Forest (DRF) is a flexible forest-based method to estimate the full conditional distribution of a multivariate output of interest given input variables. In this article, we introduce a variable importance algorithm for…

Machine Learning · Statistics 2024-02-15 Clément Bénard , Jeffrey Näf , Julie Josse

A Data-Adaptive Factor Model Using Composite Quantile Approach

This paper proposes a data-adaptive factor model (DAFM), a novel framework for extracting common factors that explain the structures of high-dimensional data. DAFM adopts a composite quantile strategy to adaptively capture the full…

Methodology · Statistics 2025-10-02 Seeun Park , Hee-Seok Oh

Unsupervised Variable Selection for Ultrahigh-Dimensional Clustering Analysis

Compared to supervised variable selection, the research on unsupervised variable selection is far behind. A forward partial-variable clustering full-variable loss (FPCFL) method is proposed for the corresponding challenges. An advantage is…

Methodology · Statistics 2024-12-02 Tonglin Zhang , Huyunting Huang

One-dimensional quantile-stratified sampling and its application in statistical simulations

In this paper we examine quantile-stratified samples from a known univariate probability distribution, with stratification occurring over a partition of the quantile regions in the distribution. We examine some general properties of this…

Methodology · Statistics 2025-09-09 Ben O'Neill

Feature and Variable Selection in Classification

The amount of information in the form of features and variables avail- able to machine learning algorithms is ever increasing. This can lead to classifiers that are prone to overfitting in high dimensions, high di- mensional models do not…

Machine Learning · Computer Science 2014-02-12 Aaron Karper

A Computationally Efficient Approach to False Discovery Rate Control and Power Maximisation via Randomisation and Mirror Statistic

Simultaneously performing variable selection and inference in high-dimensional regression models is an open challenge in statistics and machine learning. The increasing availability of vast amounts of variables requires the adoption of…

Methodology · Statistics 2025-05-08 Marco Molinari , Magne Thoresen

Discovering Conditionally Salient Features with Statistical Guarantees

The goal of feature selection is to identify important features that are relevant to explain an outcome variable. Most of the work in this domain has focused on identifying globally relevant features, which are features that are related to…

Machine Learning · Statistics 2019-05-30 Jaime Roquero Gimenez , James Zou

Variable Selection for Clustering and Classification

As data sets continue to grow in size and complexity, effective and efficient techniques are needed to target important features in the variable space. Many of the variable selection techniques that are commonly used alongside clustering…

Computation · Statistics 2013-03-22 Jeffrey L. Andrews , Paul D. McNicholas

Flexible Model Aggregation for Quantile Regression

Quantile regression is a fundamental problem in statistical learning motivated by a need to quantify uncertainty in predictions, or to model a diverse population without being overly reductive. For instance, epidemiological forecasts, cost…

Machine Learning · Statistics 2023-04-18 Rasool Fakoor , Taesup Kim , Jonas Mueller , Alexander J. Smola , Ryan J. Tibshirani