Related papers: Testing Most Influential Sets

An Automatic Finite-Sample Robustness Metric: When Can Dropping a Little Data Make a Big Difference?

Study samples often differ from the target populations of inference and policy decisions in non-random ways. Researchers typically believe that such departures from random sampling -- due to changes in the population over time and space, or…

Methodology · Statistics 2023-07-20 Tamara Broderick , Ryan Giordano , Rachael Meager

Most Influential Subset Selection: Challenges, Promises, and Beyond

How can we attribute the behaviors of machine learning models to their training data? While the classic influence function sheds light on the impact of individual samples, it often fails to capture the more complex and pronounced collective…

Machine Learning · Computer Science 2025-01-10 Yuzheng Hu , Pingbang Hu , Han Zhao , Jiaqi W. Ma

Hypothesis testing for tail dependence parameters on the boundary of the parameter space

Modelling multivariate tail dependence is one of the key challenges in extreme-value theory. Multivariate extremes are usually characterized using parametric models, some of which have simpler submodels at the boundary of their parameter…

Methodology · Statistics 2018-12-17 Anna Kiriliouk

The Approximate Fisher Influence Function: Faster Estimation of Data Influence in Statistical Models

Quantifying the influence of infinitesimal changes in training data on model performance is crucial for understanding and improving machine learning models. In this work, we reformulate this problem as a weighted empirical risk minimization…

Machine Learning · Computer Science 2025-04-11 Omri Lev , Ashia C. Wilson

Measuring Average Treatment Effect from Heavy-tailed Data

Heavy-tailed metrics are common and often critical to product evaluation in the online world. While we may have samples large enough for Central Limit Theorem to kick in, experimentation is challenging due to the wide confidence interval of…

Applications · Statistics 2019-05-23 Jason , Wang , Pauline Burke

Training Data Influence Analysis and Estimation: A Survey

Good models require good training data. For overparameterized deep models, the causal relationship between training data and model predictions is increasingly opaque and poorly understood. Influence analysis partially demystifies training's…

Machine Learning · Computer Science 2024-04-02 Zayd Hammoudeh , Daniel Lowd

Detecting tail behavior: mean excess plots with confidence bounds

In many practical situations exploratory plots are helpful in understanding tail behavior of sample data. The Mean Excess plot is often applied in practice to understand the right tail behavior of a data set. It is known that if the…

Statistics Theory · Mathematics 2015-01-06 Bikramjit Das , Souvik Ghosh

Flexible random-effects distribution models for meta-analysis

In meta-analysis, the random-effects models are standard tools to address between-study heterogeneity in evidence synthesis analyses. For the random-effects distribution models, the normal distribution model has been adopted in most…

Applications · Statistics 2021-07-28 Hisashi Noma , Kengo Nagashima , Shogo Kato , Satoshi Teramukai , Toshi A. Furukawa

Heavy Tails and Predictive Ability Testing

We study the asymptotic behaviour of widely used tests for evaluating and comparing predictive accuracy when forecast errors exhibit heavy tails. In particular, when loss differentials have infinite variance, the Diebold-Mariano test…

Methodology · Statistics 2026-05-20 Jonas F. Frederiksen , Muneya Matsui , Rasmus S. Pedersen

f-INE: A Hypothesis Testing Framework for Estimating Influence under Training Randomness

Influence estimation methods promise to explain and debug machine learning by estimating the impact of individual samples on the final model. Yet, existing methods collapse under training randomness: the same example may appear critical in…

Machine Learning · Computer Science 2026-04-06 Subhodip Panda , Dhruv Tarsadiya , Shashwat Sourav , Prathosh A. P , Sai Praneeth Karimireddy

On the Accuracy of Influence Functions for Measuring Group Effects

Influence functions estimate the effect of removing a training point on a model without the need to retrain. They are based on a first-order Taylor approximation that is guaranteed to be accurate for sufficiently small changes to the model,…

Machine Learning · Computer Science 2019-11-22 Pang Wei Koh , Kai-Siang Ang , Hubert H. K. Teo , Percy Liang

Scalable semiparametric inference for the means of heavy-tailed distributions

Heavy tailed distributions present a tough setting for inference. They are also common in industrial applications, particularly with Internet transaction datasets, and machine learners often analyze such data without considering the biases…

Applications · Statistics 2016-10-14 Matt Taddy , Hedibert Freitas Lopes , Matt Gardner

Estimating Extreme Value Index by Subsampling for Massive Datasets with Heavy-Tailed Distributions

Modern statistical analyses often encounter datasets with massive sizes and heavy-tailed distributions. For datasets with massive sizes, traditional estimation methods can hardly be used to estimate the extreme value index directly. To…

Methodology · Statistics 2022-07-26 Yongxin Li , Liujun Chen , Deyuan Li , Hansheng Wang

A More Robust t-Test

Standard inference about a scalar parameter estimated via GMM amounts to applying a t-test to a particular set of observations. If the number of observations is not very large, then moderately heavy tails can lead to poor behavior of the…

Econometrics · Economics 2020-07-15 Ulrich K. Mueller

Asymptotic theory and statistical inference for the samples problems with heavy-tailed data using the functional empirical process

This paper introduces the Trimmed Functional Empirical Process (TFEP) as a robust framework for statistical inference when dealing with heavy-tailed or skewed distributions, where classical moments such as the mean or variance may be…

Methodology · Statistics 2025-12-09 Abdoulaye Camara , Saliou Diouf , Moumouni Diallo , Gane Samb Lo

Statistical and Computational Guarantees for Influence Diagnostics

Influence diagnostics such as influence functions and approximate maximum influence perturbations are popular in machine learning and in AI domain applications. Influence diagnostics are powerful statistical tools to identify influential…

Machine Learning · Statistics 2023-09-21 Jillian Fisher , Lang Liu , Krishna Pillutla , Yejin Choi , Zaid Harchaoui

Fast and Robust Least Squares Estimation in Corrupted Linear Models

Subsampling methods have been recently proposed to speed up least squares estimation in large scale settings. However, these algorithms are typically not robust to outliers or corruptions in the observed covariates. The concept of influence…

Machine Learning · Statistics 2014-06-20 Brian McWilliams , Gabriel Krummenacher , Mario Lucic , Joachim M. Buhmann

The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes

Large-scale black-box models have become ubiquitous across numerous applications. Understanding the influence of individual training data sources on predictions made by these models is crucial for improving their trustworthiness. Current…

Machine Learning · Computer Science 2024-06-21 Myeongseob Ko , Feiyang Kang , Weiyan Shi , Ming Jin , Zhou Yu , Ruoxi Jia

On Empirical Risk Minimization with Dependent and Heavy-Tailed Data

In this work, we establish risk bounds for the Empirical Risk Minimization (ERM) with both dependent and heavy-tailed data-generating processes. We do so by extending the seminal works of Mendelson [Men15, Men18] on the analysis of ERM with…

Statistics Theory · Mathematics 2021-09-14 Abhishek Roy , Krishnakumar Balasubramanian , Murat A. Erdogdu

The Spurious Factor Dilemma: Robust Inference in Heavy-Tailed Elliptical Factor Models

Standard methods for determining the number of factors often overestimate the true number when data exhibit heavy-tailed randomness, misinterpreting noise-induced outliers as genuine factors. This paper addresses this challenge within the…

Methodology · Statistics 2026-03-04 Jiang Hu , Jiahui Xie , Yangchun Zhang , Wang Zhou