Related papers: Comparing Foundation Models using Data Kernels
This paper introduces an approach for detecting differences in the first-order structures of spatial point patterns. The proposed approach leverages the kernel mean embedding in a novel way by introducing its approximate version tailored to…
While several feature embedding models have been developed in the literature, comparisons of these embeddings have largely focused on their numerical performance in classification-related downstream applications. However, an interpretable…
Embedders play a central role in machine learning, projecting any object into numerical representations that can, in turn, be leveraged to perform various downstream tasks. The evaluation of embedding models typically depends on…
Embeddings mapping high-dimensional discrete input to lower-dimensional continuous vector spaces have been widely adopted in machine learning applications as a way to capture domain semantics. Interviewing 13 embedding users across…
Graph kernels are widely used for measuring the similarity between graphs. Many existing graph kernels, which focus on local patterns within graphs rather than their global properties, suffer from significant structure information loss when…
Foundation models can be disruptive for future AI development by scaling up deep learning in terms of model size and training data's breadth and size. These models achieve state-of-the-art performance (often through further adaptation) on a…
The success of kernel-based learning methods depend on the choice of kernel. Recently, kernel learning methods have been proposed that use data to select the most appropriate kernel, usually by combining a set of base kernels. We introduce…
Domain specific (dis-)similarity or proximity measures used e.g. in alignment algorithms of sequence data, are popular to analyze complex data objects and to cover domain specific data properties. Without an underlying vector space these…
Model misspecification can create significant challenges for the implementation of probabilistic models, and this has led to development of a range of robust methods which directly account for this issue. However, whether these more…
Depth measures are powerful tools for defining level sets in emerging, non--standard, and complex random objects such as high-dimensional multivariate data, functional data, and random graphs. Despite their favorable theoretical properties,…
We present a machine learning approach for model-independent new physics searches. The corresponding algorithm is powered by recent large-scale implementations of kernel methods, nonparametric learning algorithms that can approximate any…
Foundation models, as a mainstream technology in artificial intelligence, have demonstrated immense potential across various domains in recent years, particularly in handling complex tasks and multimodal data. In the field of geophysics,…
Pretrained foundation models learn embeddings that can be used for a wide range of downstream tasks. These embeddings optimise general performance, and if insufficiently accurate at a specific task the model can be fine-tuned to improve…
This paper introduces a new kernel-based classifier by viewing kernel matrices as generalized graphs and leveraging recent progress in graph embedding techniques. The proposed method facilitates fast and scalable kernel matrix embedding,…
Modeling environmental ecosystems is essential for effective resource management, sustainable development, and understanding complex ecological processes. However, traditional methods frequently struggle with the inherent complexity,…
Entity alignment seeks to find entities in different knowledge graphs (KGs) that refer to the same real-world object. Recent advancement in KG embedding impels the advent of embedding-based entity alignment, which encodes entities in a…
For graph classification tasks, many traditional kernel methods focus on measuring the similarity between graphs. These methods have achieved great success on resolving graph isomorphism problems. However, in some classification problems,…
Comparing competing mathematical models of complex natural processes is a shared goal among many branches of science. The Bayesian probabilistic framework offers a principled way to perform model comparison and extract useful metrics for…
Node embedding methods find latent lower-dimensional representations which are used as features in machine learning models. In the last few years, these methods have become extremely popular as a replacement for manual feature engineering.…
Distance-based tests, also called "energy statistics", are leading methods for two-sample and independence tests from the statistics community. Kernel-based tests, developed from "kernel mean embeddings", are leading methods for two-sample…