Related papers: Identifying important predictors in large data bas…
The boom of DL technology leads to massive DL models built and shared, which facilitates the acquisition and reuse of DL models. For a given task, we encounter multiple DL models available with the same functionality, which are considered…
Sorted L-One Penalized Estimation (SLOPE) has shown the nice theoretical property as well as empirical behavior recently on the false discovery rate (FDR) control of high-dimensional feature selection by adaptively imposing the…
We introduce a novel method for sparse regression and variable selection, which is inspired by modern ideas in multiple testing. Imagine we have observations from the linear model y = X beta + z, then we suggest estimating the regression…
In the era of big data, analysts usually explore various statistical models or machine learning methods for observed data in order to facilitate scientific discoveries or gain predictive power. Whatever data and fitting procedures are…
While data science is battling to extract information from the enormous explosion of data, many estimators and algorithms are being developed for better prediction. Researchers and data scientists often introduce new methods and evaluate…
Many methods have been developed to estimate the set of relevant variables in a sparse linear model Y= XB+e where the dimension p of B can be much higher than the length n of Y. Here we propose two new methods based on multiple hypotheses…
The correct use of model evaluation, model selection, and algorithm selection techniques is vital in academic machine learning research as well as in many industrial settings. This article reviews different techniques that can be used for…
Multiple hypothesis testing, a situation when we wish to consider many hypotheses, is a core problem in statistical inference that arises in almost every scientific field. In this setting, controlling the false discovery rate (FDR), which…
As the volume and complexity of data continue to expand across various scientific disciplines, the need for robust methods to account for the multiplicity of comparisons has grown widespread. A popular measure of type 1 error rate in…
With the increasing deployment of machine learning models in many socially sensitive tasks, there is a growing demand for reliable and trustworthy predictions. One way to accomplish these requirements is to allow a model to abstain from…
The problem of identifying the most discriminating features when performing supervised learning has been extensively investigated. In particular, several methods for variable selection in model-based classification have been proposed.…
The most popular multiple testing procedures are stepwise procedures based on $P$-values for individual test statistics. Included among these are the false discovery rate (FDR) controlling procedures of Benjamini--Hochberg [J. Roy. Statist.…
Variable selection, also known as feature selection in machine learning, plays an important role in modeling high dimensional data and is key to data-driven scientific discoveries. We consider here the problem of detecting influential…
Obtaining high-quality labels for large datasets is expensive, requiring massive annotations from human experts. While AI models offer a cost-effective alternative by predicting labels, their label quality is compromised by the unavoidable…
Conditional modeling x \to y is a central problem in machine learning. A substantial research effort is devoted to such modeling when x is high dimensional. We consider, instead, the case of a high dimensional y, where x is either low…
Although significant progress achieved, multi-label classification is still challenging due to the complexity of correlations among different labels. Furthermore, modeling the relationships between input and some (dull) classes further…
Compounding error, where small prediction mistakes accumulate over time, presents a major challenge in learning-based control. A common remedy is to train multi-step predictors directly instead of rolling out single-step models. However, it…
Model selection often aims to choose a single model, assuming that the form of the model is correct. However, there may be multiple possible underlying explanatory patterns in a set of predictors that could explain a response. Model…
Penalized likelihood approaches are widely used for high-dimensional regression. Although many methods have been proposed and the associated theory is now well-developed, the relative efficacy of different approaches in finite-sample…
In the context of multiple hypotheses testing, the proportion $\pi_0$ of true null hypotheses in the pool of hypotheses to test often plays a crucial role, although it is generally unknown a priori. A testing procedure using an implicit or…