Related papers: Quantile Based Variable Mining : Detection, FDR ba…
Addressing the simultaneous identification of contributory variables while controlling the false discovery rate (FDR) in high-dimensional data is a crucial statistical challenge. In this paper, we propose a novel model-free variable…
Quantile Factor Models (QFM) represent a new class of factor models for high-dimensional panel data. Unlike Approximate Factor Models (AFM), where only location-shifting factors can be extracted, QFM also allow to recover unobserved factors…
Selectivity estimation aims at estimating the number of database objects that satisfy a selection criterion. Answering this problem accurately and efficiently is essential to many applications, such as density estimation, outlier detection,…
The problem of selecting a handful of truly relevant variables in supervised machine learning algorithms is a challenging problem in terms of untestable assumptions that must hold and unavailability of theoretical assurances that selection…
Variable selection has been widely used in data analysis for the past decades, and it becomes increasingly important in the Big Data era as there are usually hundreds of variables available in a dataset. To enhance interpretability of a…
Algorithms that ensure reproducible findings from large-scale, high-dimensional data are pivotal in numerous signal processing applications. In recent years, multivariate false discovery rate (FDR) controlling methods have emerged,…
Selecting relevant features associated with a given response variable is an important issue in many scientific fields. Quantifying quality and uncertainty of a selection result via false discovery rate (FDR) control has been of recent…
Quantile-based classifiers can classify high-dimensional observations by minimising a discrepancy of an observation to a class based on suitable quantiles of the within-class distributions, corresponding to a unique percentage for all…
This paper presents a versatile technique for the purpose of feature selection and extraction - Class Dependent Features (CDFs). We use CDFs to improve the accuracy of classification and at the same time control computational expense by…
Technological innovations have revolutionized the process of scientific research and knowledge discovery. The availability of massive data and challenges from frontiers of research and development have reshaped statistical thinking, data…
There has been recent interest in extending the ideas of False Discovery Rates (FDR) to variable selection in regression settings. Traditionally the FDR in these settings has been defined in terms of the coefficients of the full regression…
Distributional Random Forest (DRF) is a flexible forest-based method to estimate the full conditional distribution of a multivariate output of interest given input variables. In this article, we introduce a variable importance algorithm for…
This paper proposes a data-adaptive factor model (DAFM), a novel framework for extracting common factors that explain the structures of high-dimensional data. DAFM adopts a composite quantile strategy to adaptively capture the full…
Compared to supervised variable selection, the research on unsupervised variable selection is far behind. A forward partial-variable clustering full-variable loss (FPCFL) method is proposed for the corresponding challenges. An advantage is…
In this paper we examine quantile-stratified samples from a known univariate probability distribution, with stratification occurring over a partition of the quantile regions in the distribution. We examine some general properties of this…
The amount of information in the form of features and variables avail- able to machine learning algorithms is ever increasing. This can lead to classifiers that are prone to overfitting in high dimensions, high di- mensional models do not…
Simultaneously performing variable selection and inference in high-dimensional regression models is an open challenge in statistics and machine learning. The increasing availability of vast amounts of variables requires the adoption of…
The goal of feature selection is to identify important features that are relevant to explain an outcome variable. Most of the work in this domain has focused on identifying globally relevant features, which are features that are related to…
As data sets continue to grow in size and complexity, effective and efficient techniques are needed to target important features in the variable space. Many of the variable selection techniques that are commonly used alongside clustering…
Quantile regression is a fundamental problem in statistical learning motivated by a need to quantify uncertainty in predictions, or to model a diverse population without being overly reductive. For instance, epidemiological forecasts, cost…