Related papers: Tutorial: Deriving The Efficient Influence Curve f…
Despite the risk of misspecification they are tied to, parametric models continue to be used in statistical practice because they are accessible to all. In particular, efficient estimation procedures in parametric models are simple to…
The goal of data attribution is to trace the model's predictions through the learning algorithm and back to its training data. thereby identifying the most influential training samples and understanding how the model's behavior leads to…
Assessing the impact the training data on machine learning models is crucial for understanding the behavior of the model, enhancing the transparency, and selecting training data. Influence function provides a theoretical framework for…
We propose and analyze estimators for statistical functionals of one or more distributions under nonparametric assumptions. Our estimators are based on the theory of influence functions, which appear in the semiparametric statistics…
Evaluation of treatment effects and more general estimands is typically achieved via parametric modelling, which is unsatisfactory since model misspecification is likely. Data-adaptive model building (e.g. statistical/machine learning) is…
Epidemiologists increasingly use causal inference methods that rely on machine learning, as these approaches can relax unnecessary model specification assumptions. While deriving and studying asymptotic properties of such estimators is a…
Sub-sampling is a common and often effective method to deal with the computational challenges of large datasets. However, for most statistical models, there is no well-motivated approach for drawing a non-uniform subsample. We show that the…
Influence functions estimate the effect of removing a training point on a model without the need to retrain. They are based on a first-order Taylor approximation that is guaranteed to be accurate for sufficiently small changes to the model,…
Influence functions provide crucial insights into model training, but existing methods suffer from large computational costs and limited generalization. Particularly, recent works have proposed various metrics and algorithms to calculate…
The increasing complexity of machine learning (ML) and artificial intelligence (AI) models has created a pressing need for tools that help scientists, engineers, and policymakers interpret and refine model decisions and predictions.…
Diffusion models have led to significant advancements in generative modelling. Yet their widespread adoption poses challenges regarding data attribution and interpretability. In this paper, we aim to help address such challenges in…
Understanding the influence of a training instance on a neural network model leads to improving interpretability. However, it is difficult and inefficient to evaluate the influence, which shows how a model's prediction would be changed if a…
Quantifying the influence of infinitesimal changes in training data on model performance is crucial for understanding and improving machine learning models. In this work, we reformulate this problem as a weighted empirical risk minimization…
One of the most effective methods of channel pruning is to trim on the basis of the importance of each neuron. However, measuring the importance of each neuron is an NP-hard problem. Previous works have proposed to trim by considering the…
Model selection requires repeatedly evaluating models on a given dataset and measuring their relative performances. In modern applications of machine learning, the models being considered are increasingly more expensive to evaluate and the…
Good models require good training data. For overparameterized deep models, the causal relationship between training data and model predictions is increasingly opaque and poorly understood. Influence analysis partially demystifies training's…
How can we explain the predictions of a black-box model? In this paper, we use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through the learning algorithm and back to its training data,…
Many useful parameters depend on nonparametric first steps. Examples include games, dynamic discrete choice, average exact consumer surplus, and treatment effects. Often estimators of these parameters are asymptotically equivalent to a…
Dependency functions of dependent variables are relevant for i) performing uncertainty quantification and sensitivity analysis in presence of dependent variables and/or correlated variables, and ii) simulating random dependent variables. In…
Many language tasks (e.g., Named Entity Recognition, Part-of-Speech tagging, and Semantic Role Labeling) are naturally framed as sequence tagging problems. However, there has been comparatively little work on interpretability methods for…