Related papers: Robust distance correlation for variable screening

Robust rank correlation based screening

Independence screening is a variable selection method that uses a ranking criterion to select significant variables, particularly for statistical models with nonpolynomial dimensionality or "large p, small n" paradigms when p can be as…

Methodology · Statistics 2012-10-18 Gaorong Li , Heng Peng , Jun Zhang , Lixing Zhu

On the consistency theory of high dimensional variable screening

Variable screening is a fast dimension reduction technique for assisting high dimensional feature selection. As a preselection method, it selects a moderate size subset of candidate variables for further refining via feature selection to…

Statistics Theory · Mathematics 2015-06-09 Xiangyu Wang , Chenlei Leng , David B. Dunson

A Robust Partial Correlation-based Screening Approach

As a computationally fast and working efficient tool, sure independence screening has received much attention in solving ultrahigh dimensional problems. This paper contributes two robust sure screening approaches that simultaneously take…

Methodology · Statistics 2021-07-27 Xiaochao Xia

Distribution-free and Model-free Multivariate Feature Screening via Multivariate Rank Distance Correlation

Feature screening approaches are effective in selecting active features from data with ultrahigh dimensionality and increasing complexity; however, the majority of existing feature screening approaches are either restricted to a univariate…

Methodology · Statistics 2023-05-09 Shaofei Zhao , Guifang Fu

Ridge partial correlation screening for ultrahigh-dimensional data

Variable selection in ultrahigh-dimensional linear regression is challenging due to its high computational cost. Therefore, a screening step is usually conducted before variable selection to significantly reduce the dimension. Here we…

Methodology · Statistics 2025-04-29 Run Wang , An Nguyen , Somak Dutta , Vivekananda Roy

Variable screening with multiple studies

Advancement in technology has generated abundant high-dimensional data that allows integration of multiple relevant studies. Due to their huge computational advantage, variable screening methods based on marginal correlation have become…

Methodology · Statistics 2017-10-12 Tianzhou Ma , Zhao Ren , George C. Tseng

Feature Screening via Distance Correlation Learning

This paper is concerned with screening features in ultrahigh dimensional data analysis, which has become increasingly important in diverse scientific fields. We develop a sure independence screening procedure based on the distance…

Methodology · Statistics 2012-06-04 Runze Li , Wei Zhong , Liping Zhu

Using distance covariance for improved variable selection with applications to genetic risk models

Variable selection is of increasing importance to address the difficulties of high dimensionality in many scientific areas. In this paper, we demonstrate a property for distance covariance, which is incorporated in a novel feature screening…

Methodology · Statistics 2014-09-03 Jing Kong , Sijian Wang , Grace Wahba

Contributions to Robust and Efficient Methods for Analysis of High Dimensional Data

A ubiquitous feature of data of our era is their extra-large sizes and dimensions. Analyzing such high-dimensional data poses significant challenges, since the feature dimension is often much larger than the sample size. This thesis…

Statistics Theory · Mathematics 2025-09-11 Kai Yang

Sure Independence Screening for Ultra-High Dimensional Feature Space

Variable selection plays an important role in high dimensional statistical modeling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality $p$, estimation accuracy…

Statistics Theory · Mathematics 2008-08-27 Jianqing Fan , Jinchi Lv

Model-free Feature Screening and FDR Control with Knockoff Features

This paper proposes a model-free and data-adaptive feature screening method for ultra-high dimensional datasets. The proposed method is based on the projection correlation which measures the dependence between two random vectors. This…

Methodology · Statistics 2021-02-16 Wanjun Liu , Yuan Ke , Jingyuan Liu , Runze Li

A robust variable screening procedure for ultra-high dimensional data

Variable selection in ultra-high dimensional regression problems has become an important issue. In such situations, penalized regression models may face computational problems and some pre screening of the variables may be necessary. A…

Methodology · Statistics 2020-05-01 Abhik Ghosh , Magne Thoresen

Grouped feature screening for ultrahigh-dimensional classification via Gini distance correlation

Gini distance correlation (GDC) was recently proposed to measure the dependence between a categorical variable, Y, and a numerical random vector, X. It mutually characterizes independence between X and Y. In this article, we utilize the GDC…

Methodology · Statistics 2023-04-19 Yongli Sang , Xin Dang

Better scalability under potentially heavy-tailed feedback

We study scalable alternatives to robust gradient descent (RGD) techniques that can be used when the losses and/or gradients can be heavy-tailed, though this will be unknown to the learner. The core technique is simple: instead of trying to…

Machine Learning · Statistics 2020-12-15 Matthew J. Holland

Large Scale Correlation Screening

This paper treats the problem of screening for variables with high correlations in high dimensional data in which there can be many fewer samples than variables. We focus on threshold-based correlation screening methods for three related…

Machine Learning · Statistics 2015-03-18 Alfred O. Hero , Bala Rajaratnam

Deep Feature Screening: Feature Selection for Ultra High-Dimensional Data via Deep Neural Networks

The applications of traditional statistical feature selection methods to high-dimension, low sample-size data often struggle and encounter challenging problems, such as overfitting, curse of dimensionality, computational infeasibility, and…

Machine Learning · Statistics 2023-12-19 Kexuan Li , Fangfang Wang , Lingli Yang , Ruiqi Liu

Better scalability under potentially heavy-tailed gradients

We study a scalable alternative to robust gradient descent (RGD) techniques that can be used when the gradients can be heavy-tailed, though this will be unknown to the learner. The core technique is simple: instead of trying to robustly…

Machine Learning · Statistics 2020-12-16 Matthew J. Holland

Robust nearest-neighbor methods for classifying high-dimensional data

We suggest a robust nearest-neighbor approach to classifying high-dimensional data. The method enhances sensitivity by employing a threshold and truncates to a sequence of zeros and ones in order to reduce the deleterious impact of…

Statistics Theory · Mathematics 2009-09-02 Yao-ban Chan , Peter Hall

Making Reliable and Flexible Decisions in Long-tailed Classification

Long-tailed classification is challenging due to its heavy imbalance in class probabilities. While existing methods often focus on overall accuracy or accuracy for tail classes, they overlook a critical aspect: certain types of errors can…

Machine Learning · Computer Science 2025-01-27 Bolian Li , Ruqi Zhang

Robust variable screening for regression using factor profiling

Sure Independence Screening is a fast procedure for variable selection in ultra-high dimensional regression analysis. Unfortunately, its performance greatly deteriorates with increasing dependence among the predictors. To solve this issue,…

Methodology · Statistics 2018-11-15 Yixin Wang , Stefan Van Aelst