Robert C. Williamson
Corruption is notoriously widespread in data collection. Despite extensive research, the existing literature predominantly focuses on specific settings and learning scenarios, lacking a unified view of corruption modelization and…
Machine Learning research, including work promoting fair or equitable algorithms, often relies on the concept of a data-generating probability distribution. The standard presumption is that since data points are 'sampled from' such a…
I examine the technology of machine learning from the perspective of rhetoric, which is simply the art of persuasion. Rather than being a neutral and "objective" way to build "world models" from data, machine learning is (I argue)…
Our paper studies the predictability of online speech -- that is, how well language models learn to model the distribution of user generated content on X (previously Twitter). We define predictability as a measure of the model's…
Many leading classification algorithms output a classifier that is a weighted average of kernel evaluations. Optimizing these weights is a nontrivial problem that still attracts much research effort. Furthermore, explaining these methods to…
We introduce a notion of distance between supervised learning problems, which we call the Risk distance. This distance, inspired by optimal transport, facilitates stability results; one can quantify how seriously issues like sampling bias,…
The standard approach to causal modelling especially in social and health sciences is the potential outcomes framework due to Neyman and Rubin. In this framework, observations are thought to be drawn from a distribution over variables of…
Machine learning is about forecasting. When the forecasts come with an evaluation metric the forecasts become useful. What are reasonable evaluation metrics? How do existing evaluation metrics relate? In this work, we provide a general…
Fueled by discussions around "trustworthiness" and algorithmic fairness, calibration of predictive systems has regained scholars attention. The vanilla definition and understanding of calibration is, simply put, on all days on which the…
Motivated by recently emerging problems in machine learning and statistics, we propose data models which relax the familiar i.i.d. assumption. In essence, we seek to understand what it means for data to come from a set of probability…
What does it mean to say that, for example, the probability for rain tomorrow is between 20% and 30%? The theory for the evaluation of precise probabilistic forecasts is well-developed and is grounded in the key concepts of proper scoring…
Supervised learning has gone beyond the expected risk minimization framework. Central to most of these developments is the introduction of more general aggregation functions for losses incurred by the learner. In this paper, we turn towards…
Machine learning typically presupposes classical probability theory which implies that aggregation is built upon expectation. There are now multiple reasons to motivate looking at richer alternatives to classical probability theory as a…
We argue that insurance can act as an analogon for the social situatedness of machine learning systems, hence allowing machine learning scholars to take insights from the rich and interdisciplinary insurance literature. Tracing the…
We introduce two new classes of measures of information for statistical experiments which generalise and subsume $\phi$-divergences, integral probability metrics, $\mathfrak{N}$-distances (MMD), and $(f,\Gamma)$ divergences between two or…
Statistical decision problems lie at the heart of statistical machine learning. The simplest problems are binary and multiclass classification and class probability estimation. Central to their definition is the choice of loss function,…
Strict frequentism defines probability as the limiting relative frequency in an infinite sequence. What if the limit does not exist? We present a broader theory, which is applicable also to random phenomena that exhibit diverging relative…
In literature on imprecise probability little attention is paid to the fact that imprecise probabilities are precise on a set of events. We call these sets systems of precision. We show that, under mild assumptions, the system of precision…
Mixable loss functions are of fundamental importance in the context of prediction with expert advice in the online setting since they characterize fast learning rates. By re-interpreting properness from the point of view of differential…
Expected risk minimization (ERM) is at the core of many machine learning systems. This means that the risk inherent in a loss distribution is summarized using a single number - its average. In this paper, we propose a general approach to…