English
Related papers

Related papers: Tutorial: Deriving The Efficient Influence Curve f…

200 papers

Despite the risk of misspecification they are tied to, parametric models continue to be used in statistical practice because they are accessible to all. In particular, efficient estimation procedures in parametric models are simple to…

Statistics Theory · Mathematics 2016-09-01 Marco Carone , Alexander R. Luedtke , Mark J. van der Laan

The goal of data attribution is to trace the model's predictions through the learning algorithm and back to its training data. thereby identifying the most influential training samples and understanding how the model's behavior leads to…

Machine Learning · Computer Science 2025-08-12 Hongbo Zhu , Angelo Cangelosi

Assessing the impact the training data on machine learning models is crucial for understanding the behavior of the model, enhancing the transparency, and selecting training data. Influence function provides a theoretical framework for…

Machine Learning · Computer Science 2026-04-21 Yuchen Zhang , Mohammad Mohammadi Amiri

We propose and analyze estimators for statistical functionals of one or more distributions under nonparametric assumptions. Our estimators are based on the theory of influence functions, which appear in the semiparametric statistics…

Evaluation of treatment effects and more general estimands is typically achieved via parametric modelling, which is unsatisfactory since model misspecification is likely. Data-adaptive model building (e.g. statistical/machine learning) is…

Statistics Theory · Mathematics 2022-01-14 Oliver Hines , Oliver Dukes , Karla Diaz-Ordaz , Stijn Vansteelandt

Epidemiologists increasingly use causal inference methods that rely on machine learning, as these approaches can relax unnecessary model specification assumptions. While deriving and studying asymptotic properties of such estimators is a…

Methodology · Statistics 2025-02-11 Audrey Renson , Lina Montoya , Dana E. Goin , Iván Díaz , Rachael K. Ross

Sub-sampling is a common and often effective method to deal with the computational challenges of large datasets. However, for most statistical models, there is no well-motivated approach for drawing a non-uniform subsample. We show that the…

Machine Learning · Statistics 2017-09-07 Daniel Ting , Eric Brochu

Influence functions estimate the effect of removing a training point on a model without the need to retrain. They are based on a first-order Taylor approximation that is guaranteed to be accurate for sufficiently small changes to the model,…

Machine Learning · Computer Science 2019-11-22 Pang Wei Koh , Kai-Siang Ang , Hubert H. K. Teo , Percy Liang

Influence functions provide crucial insights into model training, but existing methods suffer from large computational costs and limited generalization. Particularly, recent works have proposed various metrics and algorithms to calculate…

Machine Learning · Computer Science 2025-10-31 Ishika Agarwal , Dilek Hakkani-Tür

The increasing complexity of machine learning (ML) and artificial intelligence (AI) models has created a pressing need for tools that help scientists, engineers, and policymakers interpret and refine model decisions and predictions.…

Machine Learning · Statistics 2025-07-17 Haolin Zou , Arnab Auddy , Yongchan Kwon , Kamiar Rahnama Rad , Arian Maleki

Diffusion models have led to significant advancements in generative modelling. Yet their widespread adoption poses challenges regarding data attribution and interpretability. In this paper, we aim to help address such challenges in…

Machine Learning · Computer Science 2025-05-27 Bruno Mlodozeniec , Runa Eschenhagen , Juhan Bae , Alexander Immer , David Krueger , Richard Turner

Understanding the influence of a training instance on a neural network model leads to improving interpretability. However, it is difficult and inefficient to evaluate the influence, which shows how a model's prediction would be changed if a…

Machine Learning · Computer Science 2021-11-22 Sosuke Kobayashi , Sho Yokoi , Jun Suzuki , Kentaro Inui

Quantifying the influence of infinitesimal changes in training data on model performance is crucial for understanding and improving machine learning models. In this work, we reformulate this problem as a weighted empirical risk minimization…

Machine Learning · Computer Science 2025-04-11 Omri Lev , Ashia C. Wilson

One of the most effective methods of channel pruning is to trim on the basis of the importance of each neuron. However, measuring the importance of each neuron is an NP-hard problem. Previous works have proposed to trim by considering the…

Machine Learning · Computer Science 2021-12-07 Bilan Lai , Haoran Xiang , Furao Shen

Model selection requires repeatedly evaluating models on a given dataset and measuring their relative performances. In modern applications of machine learning, the models being considered are increasingly more expensive to evaluate and the…

Machine Learning · Computer Science 2020-10-21 Anant Raj , Cameron Musco , Lester Mackey , Nicolo Fusi

Good models require good training data. For overparameterized deep models, the causal relationship between training data and model predictions is increasingly opaque and poorly understood. Influence analysis partially demystifies training's…

Machine Learning · Computer Science 2024-04-02 Zayd Hammoudeh , Daniel Lowd

How can we explain the predictions of a black-box model? In this paper, we use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through the learning algorithm and back to its training data,…

Machine Learning · Statistics 2021-01-01 Pang Wei Koh , Percy Liang

Many useful parameters depend on nonparametric first steps. Examples include games, dynamic discrete choice, average exact consumer surplus, and treatment effects. Often estimators of these parameters are asymptotically equivalent to a…

Methodology · Statistics 2021-07-29 Hidehiko Ichimura , Whitney K. Newey

Dependency functions of dependent variables are relevant for i) performing uncertainty quantification and sensitivity analysis in presence of dependent variables and/or correlated variables, and ii) simulating random dependent variables. In…

Methodology · Statistics 2022-03-22 Matieyendou Lamboni

Many language tasks (e.g., Named Entity Recognition, Part-of-Speech tagging, and Semantic Role Labeling) are naturally framed as sequence tagging problems. However, there has been comparatively little work on interpretability methods for…

Computation and Language · Computer Science 2022-10-26 Sarthak Jain , Varun Manjunatha , Byron C. Wallace , Ani Nenkova
‹ Prev 1 2 3 10 Next ›