机器学习
In this work, we study the problem of choosing the discretisation schedule for sampling from masked discrete diffusion models in terms of the information geometry of the induced probability path. Specifically, we show that the optimal…
Calibration is a critical requirement for reliable probabilistic prediction, especially in high-risk applications. However, the theoretical understanding of which learning algorithms can simultaneously achieve high accuracy and good…
This work investigates the expected number of critical points of random neural networks with different activation functions as the depth increases in the infinite-width limit. Under suitable regularity conditions, we derive precise…
We consider the graph alignment problem, wherein the objective is to find a vertex correspondence between two graphs that maximizes the edge overlap. The graph alignment problem is an instance of the quadratic assignment problem (QAP),…
The key to VI is the selection of a tractable density to approximate the Bayesian posterior. For large and complex models a common choice is to assume independence between multivariate blocks in a partition of the parameter space. While…
Simplicial complexes (SCs) have become a popular abstraction for analyzing complex data using tools from topological data analysis or topological signal processing. However, the analysis of many real-world datasets often leads to dense SCs,…
Large Language Models (LLMs) are rapidly gaining enormous popularity in recent years. However, the training of LLMs has raised significant privacy and legal concerns, particularly regarding the distillation and inclusion of copyrighted…
Existing tools for explaining complex models and systems are associational rather than causal and do not provide mechanistic understanding. We propose a new notion called counterfactual explainability for causal attribution that is…
The aim of these notes is to demonstrate the potential for ideas in machine learning to impact on the fields of inverse problems and data assimilation. The perspective is one that is primarily aimed at researchers from inverse problems…
We study the approximation capacity of deep ReLU recurrent neural networks (RNNs) and explore the convergence properties of nonparametric least squares regression using RNNs. We derive upper bounds on the approximation error of RNNs for…
Kernel ridge regression (KRR) is a popular class of machine learning models that has become an important tool for understanding deep learning. Much of the focus thus far has been on studying the proportional asymptotic regime, $n \asymp d$,…
We study nonparametric regression using an over-parameterized two-layer neural networks trained with algorithmic guarantees in this paper. We consider the setting where the training features are drawn uniformly from the unit sphere in…
We develop a fitted value iteration (FVI) method to compute bicausal optimal transport (OT) where couplings have an adapted structure. Based on the dynamic programming formulation, FVI adopts a function class to approximate the value…
Gaussian processes are a key component of many flexible statistical and machine learning models. However, they exhibit cubic computational complexity and high memory constraints due to the need of inverting and storing a full covariance…
In this work, we explore how Neural Jump ODEs (NJODEs) can be used as generative models for It\^o processes. Given (discrete observations of) samples of a fixed underlying It\^o process, the NJODE framework can be used to approximate the…
Deep neural networks excel in high-dimensional problems, outperforming models such as kernel methods, which suffer from the curse of dimensionality. However, the theoretical foundations of this success remain poorly understood. We follow…
Diffusion models have emerged as powerful generative frameworks with widespread applications across machine learning and artificial intelligence systems. While current research has predominantly focused on linear diffusions, these…
The impressive practical performance of neural networks is often attributed to their ability to learn low-dimensional data representations and hierarchical structure directly from data. In this work, we argue that these two phenomena are…
In this paper, we propose a novel generalisation of the signature of a path, motivated by fractional calculus, which is able to describe the solutions of linear Caputo controlled FDEs. We also propose another generalisation of the…
Transformers have a remarkable ability to learn and execute tasks based on examples provided within the input itself, without explicit prior training. It has been argued that this capability, known as in-context learning (ICL), is a…