Related papers: Assessing Model Generalization in Vicinity

Risk Variance Penalization

The key of the out-of-distribution (OOD) generalization is to generalize invariance from training domains to target domains. The variance risk extrapolation (V-REx) is a practical OOD method, which depends on a domain-level regularization…

Machine Learning · Computer Science 2021-04-12 Chuanlong Xie , Haotian Ye , Fei Chen , Yue Liu , Rui Sun , Zhenguo Li

Inside-Out: Measuring Generalization in Vision Transformers Through Inner Workings

Reliable generalization metrics are fundamental to the evaluation of machine learning models. Especially in high-stakes applications where labeled target data are scarce, evaluation of models' generalization performance under distribution…

Machine Learning · Computer Science 2026-04-10 Yunxiang Peng , Mengmeng Ma , Ziyu Yao , Xi Peng

Model validation for aggregate inferences in out-of-sample prediction

Generalization to new samples is a fundamental rationale for statistical modeling. For this purpose, model validation is particularly important, but recent work in survey inference has suggested that simple aggregation of individual…

Methodology · Statistics 2024-04-15 Lauren Kennedy , Aki Vehtari , Andrew Gelman

Assaying Out-Of-Distribution Generalization in Transfer Learning

Since out-of-distribution generalization is a generally ill-posed problem, various proxy targets (e.g., calibration, adversarial robustness, algorithmic corruptions, invariance across shifts) were studied across different research programs…

Machine Learning · Computer Science 2022-10-24 Florian Wenzel , Andrea Dittadi , Peter Vincent Gehler , Carl-Johann Simon-Gabriel , Max Horn , Dominik Zietlow , David Kernert , Chris Russell , Thomas Brox , Bernt Schiele , Bernhard Schölkopf , Francesco Locatello

Principles from Clinical Research for NLP Model Generalization

The NLP community typically relies on performance of a model on a held-out test set to assess generalization. Performance drops observed in datasets outside of official test sets are generally attributed to "out-of-distribution" effects.…

Computation and Language · Computer Science 2024-04-03 Aparna Elangovan , Jiayuan He , Yuan Li , Karin Verspoor

Model Correlation Detection via Random Selection Probing

The growing prevalence of large language models (LLMs) and vision-language models (VLMs) has heightened the need for reliable techniques to determine whether a model has been fine-tuned from or is even identical to another. Existing…

Machine Learning · Computer Science 2025-09-30 Ruibo Chen , Sheng Zhang , Yihan Wu , Tong Zheng , Peihua Mai , Heng Huang

Generalization Bounds for Vicinal Risk Minimization Principle

The vicinal risk minimization (VRM) principle, first proposed by \citet{vapnik1999nature}, is an empirical risk minimization (ERM) variant that replaces Dirac masses with vicinal functions. Although there is strong numerical evidence…

Machine Learning · Computer Science 2018-11-13 Chao Zhang , Min-Hsiu Hsieh , Dacheng Tao

Confidence and Dispersity as Signals: Unsupervised Model Evaluation and Ranking

Assessing model generalization under distribution shift is essential for real-world deployment, particularly when labeled test data is unavailable. This paper presents a unified and practical framework for unsupervised model evaluation and…

Machine Learning · Computer Science 2025-10-06 Weijian Deng , Weijie Tu , Ibrahim Radwan , Mohammad Abu Alsheikh , Stephen Gould , Liang Zheng

Know your population and know your model: Using model-based regression and poststratification to generalize findings beyond the observed sample

Psychology research focuses on interactions, and this has deep implications for inference from non-representative samples. For the goal of estimating average treatment effects, we propose to fit a model allowing treatment to interact with…

Applications · Statistics 2020-04-15 Lauren Kennedy , Andrew Gelman

On the benefits of defining vicinal distributions in latent space

The vicinal risk minimization (VRM) principle is an empirical risk minimization (ERM) variant that replaces Dirac masses with vicinal functions. There is strong numerical and theoretical evidence showing that VRM outperforms ERM in terms of…

Machine Learning · Computer Science 2021-10-19 Puneet Mangla , Vedant Singh , Shreyas Jayant Havaldar , Vineeth N Balasubramanian

A Variational Dirichlet Framework for Out-of-Distribution Detection

With the recently rapid development in deep learning, deep neural networks have been widely adopted in many real-life applications. However, deep neural networks are also known to have very little control over its uncertainty for unseen…

Machine Learning · Computer Science 2019-04-23 Wenhu Chen , Yilin Shen , Hongxia Jin , William Wang

Probabilistic Test-Time Generalization by Variational Neighbor-Labeling

This paper strives for domain generalization, where models are trained exclusively on source domains before being deployed on unseen target domains. We follow the strict separation of source training and target testing, but exploit the…

Machine Learning · Computer Science 2024-07-02 Sameer Ambekar , Zehao Xiao , Jiayi Shen , Xiantong Zhen , Cees G. M. Snoek

Detecting Out-of-Distribution Inputs to Deep Generative Models Using Typicality

Recent work has shown that deep generative models can assign higher likelihood to out-of-distribution data sets than to their training data (Nalisnick et al., 2019; Choi et al., 2019). We posit that this phenomenon is caused by a mismatch…

Machine Learning · Statistics 2019-10-17 Eric Nalisnick , Akihiro Matsukawa , Yee Whye Teh , Balaji Lakshminarayanan

Goodness-of-fit testing of the distribution of posterior classification probabilities for validating model-based clustering

We present the first method for assessing the relevance of a model-based clustering result in a general framework. Standard validation criteria, like the adjusted Rand index, rely on external labels to assess partition accuracy;…

Statistics Theory · Mathematics 2026-03-30 Salima El Kolei , Matthieu Marbac

Estimating Generalization under Distribution Shifts via Domain-Invariant Representations

When machine learning models are deployed on a test distribution different from the training distribution, they can perform poorly, but overestimate their performance. In this work, we aim to better estimate a model's performance under…

Machine Learning · Computer Science 2020-07-08 Ching-Yao Chuang , Antonio Torralba , Stefanie Jegelka

On the Importance of Feature Separability in Predicting Out-Of-Distribution Error

Estimating the generalization performance is practically challenging on out-of-distribution (OOD) data without ground-truth labels. While previous methods emphasize the connection between distribution difference and OOD accuracy, we show…

Machine Learning · Computer Science 2023-10-24 Renchunzi Xie , Hongxin Wei , Lei Feng , Yuzhou Cao , Bo An

Robustness to Augmentations as a Generalization metric

Generalization is the ability of a model to predict on unseen domains and is a fundamental task in machine learning. Several generalization bounds, both theoretical and empirical have been proposed but they do not provide tight bounds .In…

Machine Learning · Computer Science 2021-01-19 Sumukh Aithal K , Dhruva Kashyap , Natarajan Subramanyam

Muddling Labels for Regularization, a novel approach to generalization

Generalization is a central problem in Machine Learning. Indeed most prediction methods require careful calibration of hyperparameters usually carried out on a hold-out \textit{validation} dataset to achieve generalization. The main goal of…

Machine Learning · Statistics 2021-02-18 Karim Lounici , Katia Meziani , Benjamin Riu

Assessing the Calibration of Subdistribution Hazard Models in Discrete Time

The generalization performance of a risk prediction model can be evaluated by its calibration, which measures the agreement between predicted and observed outcomes on external validation data. Here, methods for assessing the calibration of…

Methodology · Statistics 2020-01-31 Moritz Berger , Matthias Schmid

Risk and cross validation in ridge regression with correlated samples

Recent years have seen substantial advances in our understanding of high-dimensional ridge regression, but existing theories assume that training examples are independent. By leveraging techniques from random matrix theory and free…

Machine Learning · Statistics 2025-11-06 Alexander Atanasov , Jacob A. Zavatone-Veth , Cengiz Pehlevan