Related papers: Statistical thinking: From Tukey to Vardi and beyo…

Using data network metrics, graphics, and topology to explore network characteristics

Yehuda Vardi introduced the term network tomography and was the first to propose and study how statistical inverse methods could be adapted to attack important network problems (Vardi, 1996). More recently, in one of his final papers, Vardi…

Methodology · Statistics 2007-08-22 A. Adhikari , L. Denby , J. M. Landwehr , J. Meloche

Rethinking Aleatoric and Epistemic Uncertainty

The ideas of aleatoric and epistemic uncertainty are widely used to reason about the probabilistic predictions of machine-learning models. We identify incoherence in existing discussions of these ideas and suggest this stems from the…

Machine Learning · Computer Science 2025-08-19 Freddie Bickford Smith , Jannik Kossen , Eleanor Trollope , Mark van der Wilk , Adam Foster , Tom Rainforth

Sources of Uncertainty in Supervised Machine Learning -- A Statisticians' View

Supervised machine learning and predictive models have achieved an impressive standard today, enabling us to answer questions that were inconceivable a few years ago. Besides these successes, it becomes clear, that beyond pure prediction,…

Machine Learning · Statistics 2025-01-29 Cornelia Gruber , Patrick Oliver Schenk , Malte Schierholz , Frauke Kreuter , Göran Kauermann

Understanding the Limitations of Variational Mutual Information Estimators

Variational approaches based on neural networks are showing promise for estimating mutual information (MI) between high dimensional variables. However, they can be difficult to use in practice due to poorly understood bias/variance…

Machine Learning · Computer Science 2020-03-25 Jiaming Song , Stefano Ermon

Statistical Learning from Biased Training Samples

With the deluge of digitized information in the Big Data era, massive datasets are becoming increasingly available for learning predictive models. However, in many practical situations, the poor control of the data acquisition processes may…

Machine Learning · Statistics 2022-11-02 Stephan Clémençon , Pierre Laforgue

Model Assisted Data Integration: An unbiased sampling strategy to use nonprobability data

The aim of survey statistics is to produce estimates with a minimal bias and a corresponding acceptable variance given a specific budget, preferable with a minor response burden for the participants. In recent years, considerable efforts…

Methodology · Statistics 2026-04-02 Martin Hyllienmark , Gustaf Strandell

Explainability through uncertainty: Trustworthy decision-making with neural networks

Uncertainty is a key feature of any machine learning model and is particularly important in neural networks, which tend to be overconfident. This overconfidence is worrying under distribution shifts, where the model performance silently…

Machine Learning · Computer Science 2024-03-18 Arthur Thuy , Dries F. Benoit

Bayesian Network Tomography and Inference

The aim of this technical report is to give a short overview of known techniques for network tomography (introduced in the paper of Vardi (1996)), extended by a Bayesian approach originating Tebaldi and West (1998). Since the studies of…

Networking and Internet Architecture · Computer Science 2007-05-23 Philipp Pluch , Samo Wakounig

A Critical Reflection on the Values and Assumptions in Data Visualization

Visualization has matured into an established research field, producing widely adopted tools, design frameworks, and empirical foundations. As the field has grown, ideas from outside computer science have increasingly entered visualization…

Human-Computer Interaction · Computer Science 2026-02-26 Shehryar Saharan , Ibrahim Al-Hazwani , Miriah Meyer , Laura Garrison

Mixture-of-experts VAEs can disregard variation in surjective multimodal data

Machine learning systems are often deployed in domains that entail data from multiple modalities, for example, phenotypic and genotypic characteristics describe patients in healthcare. Previous works have developed multimodal variational…

Machine Learning · Computer Science 2022-04-12 Jannik Wolff , Tassilo Klein , Moin Nabi , Rahul G. Krishnan , Shinichi Nakajima

Node-weighted measures for complex networks with spatially embedded, sampled, or differently sized nodes

When network and graph theory are used in the study of complex systems, a typically finite set of nodes of the network under consideration is frequently either explicitly or implicitly considered representative of a much larger finite or…

Data Analysis, Statistics and Probability · Physics 2015-03-18 Jobst Heitzig , Jonathan F. Donges , Yong Zou , Norbert Marwan , Jürgen Kurths

Modeling Data Analytic Iteration With Probabilistic Outcome Sets

In 1977 John Tukey described how in exploratory data analysis, data analysts use tools, such as data visualizations, to separate their expectations from what they observe. In contrast to statistical theory, an underappreciated aspect of…

Methodology · Statistics 2024-02-02 Roger D. Peng , Stephanie C. Hicks

Why Authors Don't Visualize Uncertainty

Clear presentation of uncertainty is an exception rather than rule in media articles, data-driven reports, and consumer applications, despite proposed techniques for communicating sources of uncertainty in data. This work considers, Why do…

Human-Computer Interaction · Computer Science 2019-08-07 Jessica Hullman

Towards a Theory of Bullshit Visualization

In this unhinged rant, I lay out my suspicion that a lot of visualizations are bullshit: charts that do not have even the common decency to intentionally lie but are totally unconcerned about the state of the world or any practical utility.…

General Literature · Computer Science 2021-09-28 Michael Correll

Feedforward neural networks as statistical models: Improving interpretability through uncertainty quantification

Feedforward neural networks (FNNs) are typically viewed as pure prediction algorithms, and their strong predictive performance has led to their use in many machine-learning applications. However, their flexibility comes with an…

Methodology · Statistics 2023-11-15 Andrew McInerney , Kevin Burke

Bias robustness of depth estimators in multivariate settings

The concept of statistical depth extends the notions of the median and quantiles to other statistical models. These procedures aim to formalize the idea of identifying deeply embedded fits to a model that are less influenced by…

Statistics Theory · Mathematics 2026-05-11 Jorge G. Adrover , Marcelo Ruiz

Probabilistic Perspectives on Collecting Human Uncertainty in Predictive Data Mining

In many areas of data mining, data is collected from humans beings. In this contribution, we ask the question of how people actually respond to ordinal scales. The main problem observed is that users tend to be volatile in their choices,…

Human-Computer Interaction · Computer Science 2017-03-01 Kevin Jasberg , Sergej Sizov

Improved performance guarantees for Tukey's median

Is there a natural way to order data in dimension greater than one? The approach based on the notion of data depth, often associated with John Tukey, is among the most popular. Tukey's depth has found applications in robust statistics,…

Statistics Theory · Mathematics 2026-01-13 Stanislav Minsker , Yinan Shen

Non-standard conditionally specified models for non-ignorable missing data

Data analyses typically rely upon assumptions about missingness mechanisms that lead to observed versus missing data. When the data are missing not at random, direct assumptions about the missingness mechanism, and indirect assumptions…

Methodology · Statistics 2016-03-22 Alexander M Franks , Edoardo M Airoldi , Donald B Rubin

Context Embedding Networks

Low dimensional embeddings that capture the main variations of interest in collections of data are important for many applications. One way to construct these embeddings is to acquire estimates of similarity from the crowd. However,…

Machine Learning · Computer Science 2018-03-30 Kun Ho Kim , Oisin Mac Aodha , Pietro Perona