Tim Roith
Sparse training reduces the memory and computational costs of deep neural networks. However, sparse optimization methods, e.g., those adding an $\ell_1$ penalty, often control sparsity only indirectly through a regularization parameter…
Transformers with self-attention modules as their core components have become an integral architecture in modern large language and foundation models. In this paper, we study the evolution of tokens in deep encoder-only transformers at…
In this work, we present and investigate the novel blind inverse problem of position-blind ptychography, i.e., ptychographic phase retrieval without any knowledge of scan positions, which then must be recovered jointly with the image. The…
Recent advances in imaging technologies, deep learning and numerical performance have enabled non-invasive detailed analysis of artworks, supporting their documentation and conservation. In particular, automated detection of craquelure in…
A popular method to perform adversarial attacks on neuronal networks is the so-called fast gradient sign method and its iterative variant. In this paper, we interpret this method as an explicit Euler discretization of a differential…
These lecture notes evolve around mathematical concepts arising in inverse problems. We start by introducing inverse problems through examples such as differentiation, deconvolution, computed tomography and phase retrieval. This then leads…
In this work we propose MirrorCBO, a consensus-based optimization (CBO) method which generalizes standard CBO in the same way that mirror descent generalizes gradient descent. For this we apply the CBO methodology to a swarm of dual…
Consensus-based optimization (CBO) has established itself as an efficient gradient-free optimization scheme, with attractive mathematical properties, such as mean-field convergence results for non-convex loss functions. In this work, we…
The aim of this paper is to provide a mathematical analysis of transformer architectures using a self-attention mechanism with layer normalization. In particular, observed patterns in such architectures resembling either clusters or uniform…
We introduce CBXPy and ConsensusBasedX.jl, Python and Julia implementations of consensus-based interacting particle systems (CBX), which generalise consensus-based optimization methods (CBO) for global, derivative-free optimisation. The…
In this paper we prove the first quantitative convergence rates for the graph infinity Laplace equation for length scales at the connectivity threshold. In the graph-based semi-supervised learning community this equation is also known as…
This paper presents a method for finding a sparse representation of Barron functions. Specifically, given an $L^2$ function $f$, the inverse scale space flow is used to find a sparse measure $\mu$ minimising the $L^2$ loss between the…
In this paper we propose polarized consensus-based dynamics in order to make consensus-based optimization (CBO) and sampling (CBS) applicable for objective functions with several global minima or distributions with many modes, respectively.…
In this paper we investigate the use of Fourier Neural Operators (FNOs) for image classification in comparison to standard Convolutional Neural Networks (CNNs). Neural operators are a discretization-invariant generalization of neural…
Lipschitz learning is a graph-based semi-supervised learning method where one extends labels from a labeled to an unlabeled data set by solving the infinity Laplace equation on a weighted graph. In this work we prove uniform convergence…
Despite the large success of deep neural networks (DNN) in recent years, most neural networks still lack mathematical guarantees in terms of stability. For instance, DNNs are vulnerable to small or even imperceptible input perturbations, so…
We propose a learning framework based on stochastic Bregman iterations, also known as mirror descent, to train sparse neural networks with an inverse scale space approach. We derive a baseline algorithm called LinBreg, an accelerated…
Tackling semi-supervised learning problems with graph-based methods has become a trend in recent years since graphs can represent all kinds of data and provide a suitable framework for studying continuum limits, e.g., of differential…
We propose a novel strategy for Neural Architecture Search (NAS) based on Bregman iterations. Starting from a sparse neural network our gradient-based one-shot algorithm gradually adds relevant parameters in an inverse scale space manner.…