English
Related papers

Related papers: Testing Most Influential Sets

200 papers

Study samples often differ from the target populations of inference and policy decisions in non-random ways. Researchers typically believe that such departures from random sampling -- due to changes in the population over time and space, or…

Methodology · Statistics 2023-07-20 Tamara Broderick , Ryan Giordano , Rachael Meager

How can we attribute the behaviors of machine learning models to their training data? While the classic influence function sheds light on the impact of individual samples, it often fails to capture the more complex and pronounced collective…

Machine Learning · Computer Science 2025-01-10 Yuzheng Hu , Pingbang Hu , Han Zhao , Jiaqi W. Ma

Modelling multivariate tail dependence is one of the key challenges in extreme-value theory. Multivariate extremes are usually characterized using parametric models, some of which have simpler submodels at the boundary of their parameter…

Methodology · Statistics 2018-12-17 Anna Kiriliouk

Quantifying the influence of infinitesimal changes in training data on model performance is crucial for understanding and improving machine learning models. In this work, we reformulate this problem as a weighted empirical risk minimization…

Machine Learning · Computer Science 2025-04-11 Omri Lev , Ashia C. Wilson

Heavy-tailed metrics are common and often critical to product evaluation in the online world. While we may have samples large enough for Central Limit Theorem to kick in, experimentation is challenging due to the wide confidence interval of…

Applications · Statistics 2019-05-23 Jason , Wang , Pauline Burke

Good models require good training data. For overparameterized deep models, the causal relationship between training data and model predictions is increasingly opaque and poorly understood. Influence analysis partially demystifies training's…

Machine Learning · Computer Science 2024-04-02 Zayd Hammoudeh , Daniel Lowd

In many practical situations exploratory plots are helpful in understanding tail behavior of sample data. The Mean Excess plot is often applied in practice to understand the right tail behavior of a data set. It is known that if the…

Statistics Theory · Mathematics 2015-01-06 Bikramjit Das , Souvik Ghosh

In meta-analysis, the random-effects models are standard tools to address between-study heterogeneity in evidence synthesis analyses. For the random-effects distribution models, the normal distribution model has been adopted in most…

Applications · Statistics 2021-07-28 Hisashi Noma , Kengo Nagashima , Shogo Kato , Satoshi Teramukai , Toshi A. Furukawa

We study the asymptotic behaviour of widely used tests for evaluating and comparing predictive accuracy when forecast errors exhibit heavy tails. In particular, when loss differentials have infinite variance, the Diebold-Mariano test…

Methodology · Statistics 2026-05-20 Jonas F. Frederiksen , Muneya Matsui , Rasmus S. Pedersen

Influence estimation methods promise to explain and debug machine learning by estimating the impact of individual samples on the final model. Yet, existing methods collapse under training randomness: the same example may appear critical in…

Machine Learning · Computer Science 2026-04-06 Subhodip Panda , Dhruv Tarsadiya , Shashwat Sourav , Prathosh A. P , Sai Praneeth Karimireddy

Influence functions estimate the effect of removing a training point on a model without the need to retrain. They are based on a first-order Taylor approximation that is guaranteed to be accurate for sufficiently small changes to the model,…

Machine Learning · Computer Science 2019-11-22 Pang Wei Koh , Kai-Siang Ang , Hubert H. K. Teo , Percy Liang

Heavy tailed distributions present a tough setting for inference. They are also common in industrial applications, particularly with Internet transaction datasets, and machine learners often analyze such data without considering the biases…

Applications · Statistics 2016-10-14 Matt Taddy , Hedibert Freitas Lopes , Matt Gardner

Modern statistical analyses often encounter datasets with massive sizes and heavy-tailed distributions. For datasets with massive sizes, traditional estimation methods can hardly be used to estimate the extreme value index directly. To…

Methodology · Statistics 2022-07-26 Yongxin Li , Liujun Chen , Deyuan Li , Hansheng Wang

Standard inference about a scalar parameter estimated via GMM amounts to applying a t-test to a particular set of observations. If the number of observations is not very large, then moderately heavy tails can lead to poor behavior of the…

Econometrics · Economics 2020-07-15 Ulrich K. Mueller

This paper introduces the Trimmed Functional Empirical Process (TFEP) as a robust framework for statistical inference when dealing with heavy-tailed or skewed distributions, where classical moments such as the mean or variance may be…

Methodology · Statistics 2025-12-09 Abdoulaye Camara , Saliou Diouf , Moumouni Diallo , Gane Samb Lo

Influence diagnostics such as influence functions and approximate maximum influence perturbations are popular in machine learning and in AI domain applications. Influence diagnostics are powerful statistical tools to identify influential…

Machine Learning · Statistics 2023-09-21 Jillian Fisher , Lang Liu , Krishna Pillutla , Yejin Choi , Zaid Harchaoui

Subsampling methods have been recently proposed to speed up least squares estimation in large scale settings. However, these algorithms are typically not robust to outliers or corruptions in the observed covariates. The concept of influence…

Machine Learning · Statistics 2014-06-20 Brian McWilliams , Gabriel Krummenacher , Mario Lucic , Joachim M. Buhmann

Large-scale black-box models have become ubiquitous across numerous applications. Understanding the influence of individual training data sources on predictions made by these models is crucial for improving their trustworthiness. Current…

Machine Learning · Computer Science 2024-06-21 Myeongseob Ko , Feiyang Kang , Weiyan Shi , Ming Jin , Zhou Yu , Ruoxi Jia

In this work, we establish risk bounds for the Empirical Risk Minimization (ERM) with both dependent and heavy-tailed data-generating processes. We do so by extending the seminal works of Mendelson [Men15, Men18] on the analysis of ERM with…

Statistics Theory · Mathematics 2021-09-14 Abhishek Roy , Krishnakumar Balasubramanian , Murat A. Erdogdu

Standard methods for determining the number of factors often overestimate the true number when data exhibit heavy-tailed randomness, misinterpreting noise-induced outliers as genuine factors. This paper addresses this challenge within the…

Methodology · Statistics 2026-03-04 Jiang Hu , Jiahui Xie , Yangchun Zhang , Wang Zhou
‹ Prev 1 2 3 10 Next ›