Statistics
In this paper, we provide a comprehensive theoretical analysis of Stochastic Gradient Descent (SGD) and its momentum variants (Polyak Heavy-Ball and Nesterov) for tracking time-varying optima under strong convexity and smoothness. Our…
In this paper, we develop a comprehensive asymptotic and bootstrap theory for checkerboard-based estimation of lower and upper tail copulas under unknown marginal distributions. The estimator is constructed via local bilinear (checkerboard)…
Estimating latent epidemic states and model parameters from partially observed, noisy data remains a major challenge in infectious disease modeling. State-space formulations provide a coherent probabilistic framework for such inference, yet…
Real-world measurements often comprise a dominant signal contaminated by a noisy background. Robustly estimating the dominant signal in practice has been a fundamental statistical problem. Classically, mixture models have been used to…
The K function and its related statistics have been an enduring tool in the analysis of spatial point processes, providing an easy to compute and interpret summary statistic for characterising the interactions between points of one type, or…
We extend a heuristic method for automatic dimensionality selection, which maximizes a profile likelihood to identify "elbows" in scree plots. Our extension enables researchers to make automatic choices of multiple hyper-parameters…
Vine copulas offer flexible multivariate dependence modeling and have become widely used in machine learning. Yet, structure learning remains a key challenge. Early heuristics, such as Dissmann's greedy algorithm, are still considered the…
Understanding the dynamics of feature learning in neural networks (NNs) remains a significant challenge. The work of (Mousavi-Hosseini et al., 2023) analyzes a multiple index teacher-student setting and shows that a two-layer student…
Copulas are a fundamental tool for modelling multivariate dependencies in data, forming the method of choice in diverse fields and applications. However, the adoption of existing models for multimodal and high-dimensional dependencies is…
This paper proposes two algorithms for estimating square Wasserstein distance matrices from a small number of entries. These matrices are used to compute manifold learning embeddings like multidimensional scaling (MDS) or Isomap, but…
Nonstationary spatial processes can often be represented as stationary processes on a warped spatial domain. Selecting an appropriate spatial warping function for a given application is often difficult and, as a result of this, warping…
We study the problem of model aggregation within the Wasserstein space for probability measures on the real line. Given a fixed finite collection of candidate probability models, we consider the associated class of Wasserstein barycenters…
Observational cohort data is an important source of information for understanding the causal effects of treatments on survival and the degree to which these effects are mediated through changes in disease-related risk factors. However,…
Scientific inference is often undermined by the vast but rarely explored "multiverse" of defensible modelling choices, which can generate results as variable as the phenomena under study. We introduce RobustiPy, an open-source Python…
Modern industrial systems are often subject to multiple failure modes, and their conditions are monitored by multiple sensors, generating multiple time-series signals. Additionally, time-to-failure data are commonly available. Accurately…
The motivation of this article is to improve inferences on the covariation in environmental exposures, motivated by data from a study of Toddlers Exposure to SVOCs in Indoor Environments (TESIE). The challenge is that the sample size is…
Given an unnormalized probability density $\pi\propto\mathrm{e}^{-V}$, estimating its normalizing constant $Z=\int_{\mathbb{R}^d}\mathrm{e}^{-V(x)}\mathrm{d}x$ or free energy $F=-\log Z$ is a crucial problem in Bayesian statistics,…
Gaussian processes (GPs) are flexible, probabilistic, nonparametric models widely used in fields such as spatial statistics and machine learning. A drawback of Gaussian processes is their computational cost, with $O(N^3)$ time and $O(N^2)$…
Climate change is a critical issue that will be in the political agenda for the next decades. While it is important for this topic to be discussed at higher levels, it is also of paramount importance that the populations became aware of the…
Many real-world decision problems require solving, again and again, combinatorial optimization instances drawn from a common distribution. A recent line of structured learning methods exploits this regularity by learning policies that pair…