Related papers: Statistical thinking: From Tukey to Vardi and beyo…
Yehuda Vardi introduced the term network tomography and was the first to propose and study how statistical inverse methods could be adapted to attack important network problems (Vardi, 1996). More recently, in one of his final papers, Vardi…
The ideas of aleatoric and epistemic uncertainty are widely used to reason about the probabilistic predictions of machine-learning models. We identify incoherence in existing discussions of these ideas and suggest this stems from the…
Supervised machine learning and predictive models have achieved an impressive standard today, enabling us to answer questions that were inconceivable a few years ago. Besides these successes, it becomes clear, that beyond pure prediction,…
Variational approaches based on neural networks are showing promise for estimating mutual information (MI) between high dimensional variables. However, they can be difficult to use in practice due to poorly understood bias/variance…
With the deluge of digitized information in the Big Data era, massive datasets are becoming increasingly available for learning predictive models. However, in many practical situations, the poor control of the data acquisition processes may…
The aim of survey statistics is to produce estimates with a minimal bias and a corresponding acceptable variance given a specific budget, preferable with a minor response burden for the participants. In recent years, considerable efforts…
Uncertainty is a key feature of any machine learning model and is particularly important in neural networks, which tend to be overconfident. This overconfidence is worrying under distribution shifts, where the model performance silently…
The aim of this technical report is to give a short overview of known techniques for network tomography (introduced in the paper of Vardi (1996)), extended by a Bayesian approach originating Tebaldi and West (1998). Since the studies of…
Visualization has matured into an established research field, producing widely adopted tools, design frameworks, and empirical foundations. As the field has grown, ideas from outside computer science have increasingly entered visualization…
Machine learning systems are often deployed in domains that entail data from multiple modalities, for example, phenotypic and genotypic characteristics describe patients in healthcare. Previous works have developed multimodal variational…
When network and graph theory are used in the study of complex systems, a typically finite set of nodes of the network under consideration is frequently either explicitly or implicitly considered representative of a much larger finite or…
In 1977 John Tukey described how in exploratory data analysis, data analysts use tools, such as data visualizations, to separate their expectations from what they observe. In contrast to statistical theory, an underappreciated aspect of…
Clear presentation of uncertainty is an exception rather than rule in media articles, data-driven reports, and consumer applications, despite proposed techniques for communicating sources of uncertainty in data. This work considers, Why do…
In this unhinged rant, I lay out my suspicion that a lot of visualizations are bullshit: charts that do not have even the common decency to intentionally lie but are totally unconcerned about the state of the world or any practical utility.…
Feedforward neural networks (FNNs) are typically viewed as pure prediction algorithms, and their strong predictive performance has led to their use in many machine-learning applications. However, their flexibility comes with an…
The concept of statistical depth extends the notions of the median and quantiles to other statistical models. These procedures aim to formalize the idea of identifying deeply embedded fits to a model that are less influenced by…
In many areas of data mining, data is collected from humans beings. In this contribution, we ask the question of how people actually respond to ordinal scales. The main problem observed is that users tend to be volatile in their choices,…
Is there a natural way to order data in dimension greater than one? The approach based on the notion of data depth, often associated with John Tukey, is among the most popular. Tukey's depth has found applications in robust statistics,…
Data analyses typically rely upon assumptions about missingness mechanisms that lead to observed versus missing data. When the data are missing not at random, direct assumptions about the missingness mechanism, and indirect assumptions…
Low dimensional embeddings that capture the main variations of interest in collections of data are important for many applications. One way to construct these embeddings is to acquire estimates of similarity from the crowd. However,…