Related papers: Comparing Foundation Models using Data Kernels

Kernel Mean Embedding Based Hypothesis Tests for Comparing Spatial Point Patterns

This paper introduces an approach for detecting differences in the first-order structures of spatial point patterns. The proposed approach leverages the kernel mean embedding in a novel way by introducing its approximate version tailored to…

Methodology · Statistics 2020-06-15 Raif M. Rustamov , James T. Klosowski

Towards an Explainable Comparison and Alignment of Feature Embeddings

While several feature embedding models have been developed in the literature, comparisons of these embeddings have largely focused on their numerical performance in classification-related downstream applications. However, an interpretable…

Machine Learning · Computer Science 2025-08-19 Mohammad Jalali , Bahar Dibaei Nia , Farzan Farnia

When is an Embedding Model More Promising than Another?

Embedders play a central role in machine learning, projecting any object into numerical representations that can, in turn, be leveraged to perform various downstream tasks. The evaluation of embedding models typically depends on…

Machine Learning · Computer Science 2024-11-19 Maxime Darrin , Philippe Formont , Ismail Ben Ayed , Jackie CK Cheung , Pablo Piantanida

Embedding Comparator: Visualizing Differences in Global Structure and Local Neighborhoods via Small Multiples

Embeddings mapping high-dimensional discrete input to lower-dimensional continuous vector spaces have been widely adopted in machine learning applications as a way to capture domain semantics. Interviewing 13 embedding users across…

Human-Computer Interaction · Computer Science 2022-03-07 Angie Boggust , Brandon Carter , Arvind Satyanarayan

Scalable Global Alignment Graph Kernel Using Random Features: From Node Embedding to Graph Embedding

Graph kernels are widely used for measuring the similarity between graphs. Many existing graph kernels, which focus on local patterns within graphs rather than their global properties, suffer from significant structure information loss when…

Machine Learning · Computer Science 2019-12-02 Lingfei Wu , Ian En-Hsu Yen , Zhen Zhang , Kun Xu , Liang Zhao , Xi Peng , Yinglong Xia , Charu Aggarwal

Foundation models in brief: A historical, socio-technical focus

Foundation models can be disruptive for future AI development by scaling up deep learning in terms of model size and training data's breadth and size. These models achieve state-of-the-art performance (often through further adaptation) on a…

Artificial Intelligence · Computer Science 2022-12-20 Johannes Schneider

Alignment Based Kernel Learning with a Continuous Set of Base Kernels

The success of kernel-based learning methods depend on the choice of kernel. Recently, kernel learning methods have been proposed that use data to select the most appropriate kernel, usually by combining a set of base kernels. We introduce…

Machine Learning · Computer Science 2011-12-21 Arash Afkanpour , Csaba Szepesvari , Michael Bowling

Metric and non-metric proximity transformations at linear costs

Domain specific (dis-)similarity or proximity measures used e.g. in alignment algorithms of sequence data, are popular to analyze complex data objects and to cover domain specific data properties. Without an underlying vector space these…

Data Structures and Algorithms · Computer Science 2014-11-07 Andrej Gisbrecht , Frank-Michael Schleif

Composite Goodness-of-fit Tests with Kernels

Model misspecification can create significant challenges for the implementation of probabilistic models, and this has led to development of a range of robust methods which directly account for this issue. However, whether these more…

Machine Learning · Statistics 2025-04-22 Oscar Key , Arthur Gretton , François-Xavier Briol , Tamara Fernandez

Model-Free Kernel Conformal Depth Measures Algorithm for Uncertainty Quantification in Regression Models in Separable Hilbert Spaces

Depth measures are powerful tools for defining level sets in emerging, non--standard, and complex random objects such as high-dimensional multivariate data, functional data, and random graphs. Despite their favorable theoretical properties,…

Machine Learning · Statistics 2025-06-11 Marcos Matabuena , Rahul Ghosal , Pavlo Mozharovskyi , Oscar Hernan Madrid Padilla , Jukka-Pekka Onnela

Learning new physics efficiently with nonparametric methods

We present a machine learning approach for model-independent new physics searches. The corresponding algorithm is powered by recent large-scale implementations of kernel methods, nonparametric learning algorithms that can approximate any…

High Energy Physics - Phenomenology · Physics 2022-10-17 Marco Letizia , Gianvito Losapio , Marco Rando , Gaia Grosso , Andrea Wulzer , Maurizio Pierini , Marco Zanetti , Lorenzo Rosasco

On the workflow, opportunities and challenges of developing foundation model in geophysics

Foundation models, as a mainstream technology in artificial intelligence, have demonstrated immense potential across various domains in recent years, particularly in handling complex tasks and multimodal data. In the field of geophysics,…

Geophysics · Physics 2025-04-28 Hanlin Sheng , Xinming Wu , Hang Gao , Haibin Di , Sergey Fomel , Jintao Li , Xu Si

Refining embeddings with fill-tuning: data-efficient generalised performance improvements for materials foundation models

Pretrained foundation models learn embeddings that can be used for a wide range of downstream tasks. These embeddings optimise general performance, and if insufficiently accurate at a specific task the model can be fine-tuned to improve…

Machine Learning · Computer Science 2025-02-20 Matthew P. Wilson , Edward O. Pyzer-Knapp , Nicolas Galichet , Luke Dicks

Fast and Scalable Multi-Kernel Encoder Classifier

This paper introduces a new kernel-based classifier by viewing kernel matrices as generalized graphs and leveraging recent progress in graph embedding techniques. The proposed method facilitates fast and scalable kernel matrix embedding,…

Machine Learning · Computer Science 2024-11-12 Cencheng Shen

A Survey of Foundation Models for Environmental Science

Modeling environmental ecosystems is essential for effective resource management, sustainable development, and understanding complex ecological processes. However, traditional methods frequently struggle with the inherent complexity,…

Machine Learning · Computer Science 2025-03-06 Runlong Yu , Shengyu Chen , Yiqun Xie , Xiaowei Jia

A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs

Entity alignment seeks to find entities in different knowledge graphs (KGs) that refer to the same real-world object. Recent advancement in KG embedding impels the advent of embedding-based entity alignment, which encodes entities in a…

Computation and Language · Computer Science 2020-07-21 Zequn Sun , Qingheng Zhang , Wei Hu , Chengming Wang , Muhao Chen , Farahnaz Akrami , Chengkai Li

Graph embedding using multi-layer adjacent point merging model

For graph classification tasks, many traditional kernel methods focus on measuring the similarity between graphs. These methods have achieved great success on resolving graph isomorphism problems. However, in some classification problems,…

Machine Learning · Computer Science 2021-02-18 Jianming Huang , Hiroyuki Kasai

Amortized Bayesian model comparison with evidential deep learning

Comparing competing mathematical models of complex natural processes is a shared goal among many branches of science. The Bayesian probabilistic framework offers a principled way to perform model comparison and extract useful metrics for…

Machine Learning · Statistics 2021-03-03 Stefan T. Radev , Marco D'Alessandro , Ulf K. Mertens , Andreas Voss , Ullrich Köthe , Paul-Christian Bürkner

A Process for the Evaluation of Node Embedding Methods in the Context of Node Classification

Node embedding methods find latent lower-dimensional representations which are used as features in machine learning models. In the last few years, these methods have become extremely popular as a replacement for manual feature engineering.…

Social and Information Networks · Computer Science 2020-06-01 Christoph Martin , Meike Riebeling

The Exact Equivalence of Distance and Kernel Methods for Hypothesis Testing

Distance-based tests, also called "energy statistics", are leading methods for two-sample and independence tests from the statistics community. Kernel-based tests, developed from "kernel mean embeddings", are leading methods for two-sample…

Machine Learning · Statistics 2024-06-27 Cencheng Shen , Joshua T. Vogelstein