Related papers: Variable importance scores
Recommender systems are seen as an effective tool to address information overload, but it is widely known that the presence of various biases makes direct training on large-scale observational data result in sub-optimal prediction…
Variable importance measures (VIMs) aim to quantify the contribution of each input covariate to the predictability of a given output. With the growing interest in explainable AI, numerous VIMs have been proposed, many of which are heuristic…
Score matching is a vital tool for learning the distribution of data with applications across many areas including diffusion processes, energy based modelling, and graphical model estimation. Despite all these applications, little work…
Variable importance plays a pivotal role in interpretable machine learning as it helps measure the impact of factors on the output of the prediction model. Model agnostic methods based on the generation of "null" features via permutation…
In the era of "big data", it is becoming more of a challenge to not only build state-of-the-art predictive models, but also gain an understanding of what's really going on in the data. For example, it is often of interest to know which, if…
Instrumental variable methods provide useful tools for inferring causal effects in the presence of unmeasured confounding. To apply these methods with large-scale data sets, a major challenge is to find valid instruments from a possibly…
A common approach for feature selection is to examine the variable importance scores for a machine learning model, as a way to understand which features are the most relevant for making predictions. Given the significance of feature…
In many applications, it is of interest to assess the relative contribution of features (or subsets of features) toward the goal of predicting a response -- in other words, to gauge the variable importance of features. Most recent work on…
Mitigating bias in training on biased datasets is an important open problem. Several techniques have been proposed, however the typical evaluation regime is very limited, considering very narrow data conditions. For instance, the effect of…
This work explores the effects of relevant and irrelevant boolean variables on the accuracy of classifiers. The analysis uses the assumption that the variables are conditionally independent given the class, and focuses on a natural family…
Policy gradient methods are very attractive in reinforcement learning due to their model-free nature and convergence guarantees. These methods, however, suffer from high variance in gradient estimation, resulting in poor sample efficiency.…
Identifying latent variables and the causal structure involving them is essential across various scientific fields. While many existing works fall under the category of constraint-based methods (with e.g. conditional independence or rank…
A core step of every algorithm for learning regression trees is the selection of the best splitting variable from the available covariates and the corresponding split point. Early tree algorithms (e.g., AID, CART) employed greedy search…
We propose a modification that corrects for split-improvement variable importance measures in Random Forests and other tree-based methods. These methods have been shown to be biased towards increasing the importance of features with more…
Given a collection of features available for inclusion in a predictive model, it may be of interest to quantify the relative importance of a subset of features for the prediction task at hand. For example, in HIV vaccine trials, participant…
The most popular methods for measuring importance of the variables in a black box prediction algorithm make use of synthetic inputs that combine predictor variables from multiple subjects. These inputs can be unlikely, physically…
Generally speaking, the model training for recommender systems can be based on two types of data, namely explicit feedback and implicit feedback. Moreover, because of its general availability, we see wide adoption of implicit feedback data,…
In most real-world recommender systems, the observed rating data are subject to selection bias, and the data are thus missing-not-at-random. Developing a method to facilitate the learning of a recommender with biased feedback is one of the…
Models notoriously suffer from dataset biases which are detrimental to robustness and generalization. The identify-emphasize paradigm shows a promising effect in dealing with unknown biases. However, we find that it is still plagued by two…
It is widely held that one cause of downstream bias in classifiers is bias present in the training data. Rectifying such biases may involve context-dependent interventions such as training separate models on subgroups, removing features…