Related papers: Robust distance correlation for variable screening
Independence screening is a variable selection method that uses a ranking criterion to select significant variables, particularly for statistical models with nonpolynomial dimensionality or "large p, small n" paradigms when p can be as…
Variable screening is a fast dimension reduction technique for assisting high dimensional feature selection. As a preselection method, it selects a moderate size subset of candidate variables for further refining via feature selection to…
As a computationally fast and working efficient tool, sure independence screening has received much attention in solving ultrahigh dimensional problems. This paper contributes two robust sure screening approaches that simultaneously take…
Feature screening approaches are effective in selecting active features from data with ultrahigh dimensionality and increasing complexity; however, the majority of existing feature screening approaches are either restricted to a univariate…
Variable selection in ultrahigh-dimensional linear regression is challenging due to its high computational cost. Therefore, a screening step is usually conducted before variable selection to significantly reduce the dimension. Here we…
Advancement in technology has generated abundant high-dimensional data that allows integration of multiple relevant studies. Due to their huge computational advantage, variable screening methods based on marginal correlation have become…
This paper is concerned with screening features in ultrahigh dimensional data analysis, which has become increasingly important in diverse scientific fields. We develop a sure independence screening procedure based on the distance…
Variable selection is of increasing importance to address the difficulties of high dimensionality in many scientific areas. In this paper, we demonstrate a property for distance covariance, which is incorporated in a novel feature screening…
A ubiquitous feature of data of our era is their extra-large sizes and dimensions. Analyzing such high-dimensional data poses significant challenges, since the feature dimension is often much larger than the sample size. This thesis…
Variable selection plays an important role in high dimensional statistical modeling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality $p$, estimation accuracy…
This paper proposes a model-free and data-adaptive feature screening method for ultra-high dimensional datasets. The proposed method is based on the projection correlation which measures the dependence between two random vectors. This…
Variable selection in ultra-high dimensional regression problems has become an important issue. In such situations, penalized regression models may face computational problems and some pre screening of the variables may be necessary. A…
Gini distance correlation (GDC) was recently proposed to measure the dependence between a categorical variable, Y, and a numerical random vector, X. It mutually characterizes independence between X and Y. In this article, we utilize the GDC…
We study scalable alternatives to robust gradient descent (RGD) techniques that can be used when the losses and/or gradients can be heavy-tailed, though this will be unknown to the learner. The core technique is simple: instead of trying to…
This paper treats the problem of screening for variables with high correlations in high dimensional data in which there can be many fewer samples than variables. We focus on threshold-based correlation screening methods for three related…
The applications of traditional statistical feature selection methods to high-dimension, low sample-size data often struggle and encounter challenging problems, such as overfitting, curse of dimensionality, computational infeasibility, and…
We study a scalable alternative to robust gradient descent (RGD) techniques that can be used when the gradients can be heavy-tailed, though this will be unknown to the learner. The core technique is simple: instead of trying to robustly…
We suggest a robust nearest-neighbor approach to classifying high-dimensional data. The method enhances sensitivity by employing a threshold and truncates to a sequence of zeros and ones in order to reduce the deleterious impact of…
Long-tailed classification is challenging due to its heavy imbalance in class probabilities. While existing methods often focus on overall accuracy or accuracy for tail classes, they overlook a critical aspect: certain types of errors can…
Sure Independence Screening is a fast procedure for variable selection in ultra-high dimensional regression analysis. Unfortunately, its performance greatly deteriorates with increasing dependence among the predictors. To solve this issue,…