机器学习 — Scifaro

Colored Markov Random Fields for Probabilistic Topological Modeling

Probabilistic Graphical Models (PGMs) encode conditional dependencies among random variables using a graph -nodes for variables, links for dependencies- and factorize the joint distribution into lower-dimensional components. This makes PGMs…

机器学习 · 统计学 2025-12-04 Lorenzo Marinucci , Leonardo Di Nino , Gabriele D'Acunto , Mario Edoardo Pandolfo , Paolo Di Lorenzo , Sergio Barbarossa

Novelty detection on path space

We frame novelty detection on path space as a hypothesis testing problem with signature-based test statistics. Using transportation-cost inequalities of Gasteratos and Jacquier (2023), we obtain tail bounds for false positive rates that…

机器学习 · 统计学 2025-12-04 Ioannis Gasteratos , Antoine Jacquier , Maud Lemercier , Terry Lyons , Cristopher Salvi

Iterative Tilting for Diffusion Fine-Tuning

We introduce iterative tilting, a gradient-free method for fine-tuning diffusion models toward reward-tilted distributions. The method decomposes a large reward tilt $\exp(\lambda r)$ into $N$ sequential smaller tilts, each admitting a…

机器学习 · 统计学 2025-12-04 Jean Pachebat , Giovanni Conforti , Alain Durmus , Yazid Janati

Uncertainty Quantification for Large Language Model Reward Learning under Heterogeneous Human Feedback

We study estimation and statistical inference for reward models used in aligning large language models (LLMs). A key component of LLM alignment is reinforcement learning from human feedback (RLHF), where humans compare pairs of…

机器学习 · 统计学 2025-12-04 Pangpang Liu , Junwei Lu , Will Wei Sun

A note on the impossibility of conditional PAC-efficient reasoning in large language models

We prove an impossibility result for conditional Probably Approximately Correct (PAC)-efficient reasoning in large language models. While recent work has established marginal PAC efficiency guarantees for composite models that switch…

机器学习 · 统计学 2025-12-04 Hao Zeng

No-Regret Gaussian Process Optimization of Time-Varying Functions

Sequential optimization of black-box functions from noisy evaluations has been widely studied, with Gaussian Process bandit algorithms such as GP-UCB guaranteeing no-regret in stationary settings. However, for time-varying objectives, it is…

机器学习 · 统计学 2025-12-04 Eliabelle Mauduit , Eloïse Berthier , Andrea Simonetto

Manifold Percolation: from generative model to Reinforce learning

Generative modeling is typically framed as learning mapping rules, but from an observer's perspective without access to these rules, the task becomes disentangling the geometric support from the probability distribution. We propose that…

机器学习 · 统计学 2025-12-04 Rui Tong

A Common Pipeline for Harmonizing Electronic Health Record Data for Translational Research

Despite the growing availability of Electronic Health Record (EHR) data, researchers often face substantial barriers in effectively using these data for translational research due to their complexity, heterogeneity, and lack of standardized…

机器学习 · 统计学 2025-12-04 Jessica Gronsbell , Vidul Ayakulangara Panickan , Doudou Zhou , Chris Lin , Thomas Charlon , Chuan Hong , Xin Xiong , Linshanshan Wang , Jianhui Gao , Shirley Zhou , Yuan Tian , Yaqi Shi , Ziming Gan , Tianxi Cai

Locally Adaptive Conformal Inference for Operator Models

Operator models are regression algorithms between Banach spaces of functions. They have become an increasingly critical tool for spatiotemporal forecasting and physics emulation, especially in high-stakes scenarios where robust, calibrated…

机器学习 · 统计学 2025-12-04 Trevor Harris , Yan Liu

Class conditional conformal prediction for multiple inputs by p-value aggregation

Conformal prediction methods are statistical tools designed to quantify uncertainty and generate predictive sets with guaranteed coverage probabilities. This work introduces an innovative refinement to these methods for classification…

机器学习 · 统计学 2025-12-04 Jean-Baptiste Fermanian , Mohamed Hebiri , Joseph Salmon

Revisiting Theory of Contrastive Learning for Domain Generalization

Contrastive learning is among the most popular and powerful approaches for self-supervised representation learning, where the goal is to map semantically similar samples close together while separating dissimilar ones in the latent space.…

机器学习 · 统计学 2025-12-03 Ali Alvandi , Mina Rezaei

Laplace Approximation For Tensor Train Kernel Machines In System Identification

To address the scalability limitations of Gaussian process (GP) regression, several approximation techniques have been proposed. One such method is based on tensor networks, which utilizes an exponential number of basis functions without…

机器学习 · 统计学 2025-12-03 Albert Saiapin , Kim Batselier

Bayesian Physics-Informed Neural Networks for Inverse Problems (BPINN-IP): Application in Infrared Image Processing

Inverse problems arise across scientific and engineering domains, where the goal is to infer hidden parameters or physical fields from indirect and noisy observations. Classical approaches, such as variational regularization and Bayesian…

机器学习 · 统计学 2025-12-03 Ali Mohammad-Djafari , Ning Chu , Li Wang

Sparse Multiple Kernel Learning: Alternating Best Response and Semidefinite Relaxations

We study Sparse Multiple Kernel Learning (SMKL), which is the problem of selecting a sparse convex combination of prespecified kernels for support vector binary classification. Unlike prevailing l1 regularized approaches that approximate a…

机器学习 · 统计学 2025-12-03 Dimitris Bertsimas , Caio de Prospero Iglesias , Nicholas A. G. Johnson

Front-door Reducibility: Reducing ADMGs to the Standard Front-door Setting via a Graphical Criterion

Front-door adjustment gives a simple closed-form identification formula under the classical front-door criterion, but its applicability is often viewed as narrow. By contrast, the general ID algorithm can identify many more causal effects…

机器学习 · 统计学 2025-12-03 Jianqiao Mao , Max A. Little

kNNSampler: Stochastic Imputations for Recovering Missing Value Distributions

We study a missing-value imputation method, termed kNNSampler, that imputes a given unit's missing response by randomly sampling from the observed responses of the $k$ most similar units to the given unit in terms of the observed…

机器学习 · 统计学 2025-12-03 Parastoo Pashmchi , Jérôme Benoit , Motonobu Kanagawa

Adaptive Prediction-Powered AutoEval with Reliability and Efficiency Guarantees

Selecting artificial intelligence (AI) models, such as large language models (LLMs), from multiple candidates requires accurate performance estimation. This is ideally achieved through empirical evaluations involving abundant real-world…

机器学习 · 统计学 2025-12-03 Sangwoo Park , Matteo Zecchin , Osvaldo Simeone

Anomalous Change Point Detection Using Probabilistic Predictive Coding

Change point detection (CPD) and anomaly detection (AD) are essential techniques in various fields to identify abrupt changes or abnormal data instances. However, existing methods are often constrained to univariate data, face scalability…

机器学习 · 统计学 2025-12-03 Roelof G. Hup , Julian P. Merkofer , Alex A. Bhogal , Ruud J. G. van Sloun , Reinder Haakma , Rik Vullings

Global universal approximation of functional input maps on weighted spaces

We introduce so-called functional input neural networks defined on a possibly infinite dimensional weighted space with values also in a possibly infinite dimensional output space. To this end, we use an additive family to map the input…

机器学习 · 统计学 2025-12-03 Christa Cuchiero , Philipp Schmocker , Josef Teichmann

Fundamentals of Regression

This chapter opens with a review of classic tools for regression, a subset of machine learning that seeks to find relationships between variables. With the advent of scientific machine learning this field has moved from a purely data-driven…

机器学习 · 统计学 2025-12-02 Miguel A. Mendez