Related papers: Modeling the Machine Learning Multiverse
Through case studies, we demonstrate how multiverse analysis can strengthen the robustness and transparency of computational social science findings against alternative methodological decisions. We conduct multiverse analyses of three…
When building statistical models for Bayesian data analysis tasks, required and optional iterative adjustments and different modelling choices can give rise to numerous candidate models. In particular, checks and evaluations throughout the…
Ensuring robust model performance in diverse real-world scenarios requires addressing generalizability across domains with covariate shifts. However, no formal procedure exists for statistically evaluating generalizability in machine…
Although bibliometrics has become an essential tool in the evaluation of research performance, bibliometric analyses are sensitive to a range of methodological choices. Subtle choices in data selection, indicator construction, and modeling…
Algorithmic recourse emerges as a prominent technique to promote the explainability, transparency, and ethics of machine learning models. Existing algorithmic recourse approaches often assume an invariant predictive model; however, the…
A multiverse analysis evaluates all combinations of "reasonable" analytic decisions to promote robustness and transparency, but can lead to a combinatorial explosion of analyses to compute. Long delays before assessing results prevent users…
Solving different types of optimization models (including parameters fitting) for support vector machines on large-scale training data is often an expensive computational task. This paper proposes a multilevel algorithmic framework that…
We consider a Gaussian process formulation of the multiple kernel learning problem. The goal is to select the convex combination of kernel matrices that best explains the data and by doing so improve the generalisation on unseen data.…
The methodology developed in this article is motivated by a wide range of prediction and uncertainty quantification problems that arise in Statistics, Machine Learning and Applied Mathematics, such as non-parametric regression, multi-class…
Time series machine learning (TSML) is a growing research field that spans a wide range of tasks. The popularity of established tasks such as classification, clustering, and extrinsic regression has, in part, been driven by the availability…
As machine learning (ML) systems increasingly permeate high-stakes settings such as healthcare, transportation, military, and national security, concerns regarding their reliability have emerged. Despite notable progress, the performance of…
The recent availability of huge, many-dimensional data sets, like those arising from genome-wide association studies (GWAS), provides many opportunities for strengthening causal inference. One popular approach is to utilize these…
Bayesian optimization is a popular tool for data-efficient optimization of expensive objective functions. In real-life applications like engineering design, the designer often wants to take multiple objectives as well as input uncertainty…
A general framework for dealing with both linear regression and clustering problems is described. It includes Gaussian clusterwise linear regression analysis with random covariates and cluster analysis via Gaussian mixture models with…
The present paper proposes a Bayesian framework for inverse problems that seamlessly integrates optimization and inversion to enable rapid surrogate modeling, accurate parameter inference, and rigorous uncertainty quantification. Bayesian…
Inverse problems occur in a variety of parameter identification tasks in engineering. Such problems are challenging in practice, as they require repeated evaluation of computationally expensive forward models. We introduce a unifying…
Echoing recent calls to counter reliability and robustness concerns in machine learning via multiverse analysis, we present PRESTO, a principled framework for mapping the multiverse of machine-learning models that rely on latent…
We propose a robust adversarial prediction framework for general multiclass classification. Our method seeks predictive distributions that robustly optimize non-convex and non-continuous multiclass loss metrics against the worst-case…
Supervised learning is all about the ability to generalize knowledge. Specifically, the goal of the learning is to train a classifier using training data, in such a way that it will be capable of classifying new unseen data correctly. In…
Clustering is a powerful and extensively used data science tool. While clustering is generally thought of as an unsupervised learning technique, there are also supervised variations such as Spath's clusterwise regression that attempt to…