Tom Rainforth — Scifaro

Loss-Driven Bayesian Active Learning

The central goal of active learning is to gather data that maximises downstream predictive performance, but popular approaches have limited flexibility in customising this data acquisition to different downstream problems and losses. We…

Machine Learning · Computer Science 2026-05-11 Zhuoyue Huang , Freddie Bickford Smith , Tom Rainforth

BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design

We propose a general-purpose approach for improving the ability of large language models (LLMs) to intelligently and adaptively gather information from a user or other external source using the framework of sequential Bayesian experimental…

Computation and Language · Computer Science 2026-04-22 Deepro Choudhury , Sinead Williamson , Adam Goliński , Ning Miao , Freddie Bickford Smith , Michael Kirchhof , Yizhe Zhang , Tom Rainforth

Active Learning with Task-Driven Representations for Messy Pools

Active learning has the potential to be especially useful for messy, uncurated pools where datapoints vary in relevance to the target task. However, state-of-the-art approaches to this problem currently rely on using fixed, unsupervised…

Machine Learning · Computer Science 2026-02-16 Kianoosh Ashouritaklimi , Tom Rainforth

Step-DAD: Semi-Amortized Policy-Based Bayesian Experimental Design

We develop a semi-amortized, policy-based, approach to Bayesian experimental design (BED) called Stepwise Deep Adaptive Design (Step-DAD). Like existing, fully amortized, policy-based BED approaches, Step-DAD trains a design policy upfront…

Machine Learning · Statistics 2026-01-30 Marcel Hedman , Desi R. Ivanova , Cong Guan , Tom Rainforth

Prediction-Oriented Subsampling from Data Streams

Data is often generated in streams, with new observations arriving over time. A key challenge for learning models from data streams is capturing relevant information while keeping computational costs manageable. We explore intelligent data…

Machine Learning · Computer Science 2025-12-23 Benedetta Lavinia Mussati , Freddie Bickford Smith , Tom Rainforth , Stephen Roberts

Scaling Up Active Testing to Large Language Models

Active testing enables label-efficient evaluation of predictive models through careful data acquisition, but it can pose a significant computational cost. We identify cost-saving measures that enable active testing to be scaled up to large…

Machine Learning · Computer Science 2025-11-26 Gabrielle Berrada , Jannik Kossen , Freddie Bickford Smith , Muhammed Razzak , Yarin Gal , Tom Rainforth

Kosmos: An AI Scientist for Autonomous Discovery

Data-driven scientific discovery requires iterative cycles of literature search, hypothesis generation, and data analysis. Substantial progress has been made towards AI agents that can automate scientific research, but all such agents…

Artificial Intelligence · Computer Science 2025-11-06 Ludovico Mitchener , Angela Yiu , Benjamin Chang , Mathieu Bourdenx , Tyler Nadolski , Arvis Sulovari , Eric C. Landsness , Daniel L. Barabasi , Siddharth Narayanan , Nicky Evans , Shriya Reddy , Martha Foiani , Aizad Kamal , Leah P. Shriver , Fang Cao , Asmamaw T. Wassie , Jon M. Laurent , Edwin Melville-Green , Mayk Caldas , Albert Bou , Kaleigh F. Roberts , Sladjana Zagorac , Timothy C. Orr , Miranda E. Orr , Kevin J. Zwezdaryk , Ali E. Ghareeb , Laurie McCoy , Bruna Gomes , Euan A. Ashley , Karen E. Duff , Tonio Buonassisi , Tom Rainforth , Randall J. Bateman , Michael Skarlinski , Samuel G. Rodriques , Michaela M. Hinks , Andrew D. White

A Geometric Approach to Optimal Experimental Design

We introduce a novel geometric framework for optimal experimental design (OED). Traditional OED approaches, such as those based on mutual information, rely explicitly on probability densities, leading to restrictive invariance properties.…

Machine Learning · Statistics 2025-10-17 Gavin Kerrigan , Christian A. Naesseth , Tom Rainforth

Rethinking Aleatoric and Epistemic Uncertainty

The ideas of aleatoric and epistemic uncertainty are widely used to reason about the probabilistic predictions of machine-learning models. We identify incoherence in existing discussions of these ideas and suggest this stems from the…

Machine Learning · Computer Science 2025-08-19 Freddie Bickford Smith , Jannik Kossen , Eleanor Trollope , Mark van der Wilk , Adam Foster , Tom Rainforth

Shh, don't say that! Domain Certification in LLMs

Large language models (LLMs) are often deployed to perform constrained tasks, with narrow domains. For example, customer support bots can be built on top of LLMs, relying on their broad language understanding and capabilities to enhance…

Computation and Language · Computer Science 2025-03-10 Cornelius Emde , Alasdair Paren , Preetham Arvind , Maxime Kayser , Tom Rainforth , Thomas Lukasiewicz , Bernard Ghanem , Philip H. S. Torr , Adel Bibi

Incorporating Unlabelled Data into Bayesian Neural Networks

Conventional Bayesian Neural Networks (BNNs) are unable to leverage unlabelled data to improve their predictions. To overcome this limitation, we introduce Self-Supervised Bayesian Neural Networks, which use unlabelled data to learn models…

Machine Learning · Computer Science 2024-09-02 Mrinank Sharma , Tom Rainforth , Yee Whye Teh , Vincent Fortuin

Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design

Combining discrete and continuous data is an important capability for generative models. We present Discrete Flow Models (DFMs), a new flow-based model of discrete data that provides the missing link in enabling flow-based generative models…

Machine Learning · Statistics 2024-06-07 Andrew Campbell , Jason Yim , Regina Barzilay , Tom Rainforth , Tommi Jaakkola

Making Better Use of Unlabelled Data in Bayesian Active Learning

Fully supervised models are predominant in Bayesian active learning. We argue that their neglect of the information present in unlabelled data harms not just predictive performance but also decisions about what data to acquire. Our proposed…

Machine Learning · Computer Science 2024-04-29 Freddie Bickford Smith , Adam Foster , Tom Rainforth

Beyond Bayesian Model Averaging over Paths in Probabilistic Programs with Stochastic Support

The posterior in probabilistic programs with stochastic support decomposes as a weighted sum of the local posterior distributions associated with each possible program path. We show that making predictions with this full posterior…

Machine Learning · Computer Science 2024-04-15 Tim Reichelt , Luke Ong , Tom Rainforth

In-Context Learning Learns Label Relationships but Is Not Conventional Learning

The predictions of Large Language Models (LLMs) on downstream tasks often improve significantly when including examples of the input--label relationship in the context. However, there is currently no consensus about how this in-context…

Computation and Language · Computer Science 2024-03-14 Jannik Kossen , Yarin Gal , Tom Rainforth

On the Expected Size of Conformal Prediction Sets

While conformal predictors reap the benefits of rigorous statistical guarantees on their error frequency, the size of their corresponding prediction sets is critical to their practical utility. Unfortunately, there is currently a lack of…

Machine Learning · Statistics 2024-03-12 Guneet S. Dhillon , George Deligiannidis , Tom Rainforth

Modern Bayesian Experimental Design

Bayesian experimental design (BED) provides a powerful and general framework for optimizing the design of experiments. However, its deployment often poses substantial computational challenges that can undermine its practical use. In this…

Machine Learning · Statistics 2023-11-30 Tom Rainforth , Adam Foster , Desi R Ivanova , Freddie Bickford Smith

Rethinking Variational Inference for Probabilistic Programs with Stochastic Support

We introduce Support Decomposition Variational Inference (SDVI), a new variational inference (VI) approach for probabilistic programs with stochastic support. Existing approaches to this problem rely on designing a single global variational…

Machine Learning · Computer Science 2023-11-02 Tim Reichelt , Luke Ong , Tom Rainforth

Trans-Dimensional Generative Modeling via Jump Diffusion Models

We propose a new class of generative models that naturally handle data of varying dimensionality by jointly modeling the state and dimension of each datapoint. The generative process is formulated as a jump diffusion process that makes…

Machine Learning · Statistics 2023-10-31 Andrew Campbell , William Harvey , Christian Weilbach , Valentin De Bortoli , Tom Rainforth , Arnaud Doucet

SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning

The recent progress in large language models (LLMs), especially the invention of chain-of-thought prompting, has made it possible to automatically answer questions by stepwise reasoning. However, when faced with more complicated problems…

Artificial Intelligence · Computer Science 2023-10-06 Ning Miao , Yee Whye Teh , Tom Rainforth