机器学习 — Scifaro

PAC-Bayes Bounds for Multivariate Linear Regression and Linear Autoencoders

Linear Autoencoders (LAEs) have shown strong performance in state-of-the-art recommender systems. However, this success remains largely empirical, with limited theoretical understanding. In this paper, we investigate the generalizability --…

机器学习 · 统计学 2025-12-16 Ruixin Guo , Ruoming Jin , Xinyu Li , Yang Zhou

Limits To (Machine) Learning

Machine learning (ML) methods are highly flexible, but their ability to approximate the true data-generating process is fundamentally constrained by finite samples. We characterize a universal lower bound, the Limits-to-Learning Gap (LLG),…

机器学习 · 统计学 2025-12-16 Zhimin Chen , Bryan Kelly , Semyon Malamud

Mind the Jumps: A Scalable Robust Local Gaussian Process for Multidimensional Response Surfaces with Discontinuities

Modeling response surfaces with abrupt jumps and discontinuities remains a major challenge across scientific and engineering domains. Although Gaussian process models excel at capturing smooth nonlinear relationships, their stationarity…

机器学习 · 统计学 2025-12-16 Isaac Adjetey , Yiyuan She

Iterative Sampling Methods for Sinkhorn Distributionally Robust Optimization

Distributionally robust optimization (DRO) has emerged as a powerful paradigm for reliable decision-making under uncertainty. This paper focuses on DRO with ambiguity sets defined via the Sinkhorn discrepancy: an entropy-regularized…

机器学习 · 统计学 2025-12-16 Jie Wang

Efficient Level-Crossing Probability Calculation for Gaussian Process Modeled Data

Almost all scientific data have uncertainties originating from different sources. Gaussian process regression (GPR) models are a natural way to model data with Gaussian-distributed uncertainties. GPR also has the benefit of reducing I/O…

机器学习 · 统计学 2025-12-16 Haoyu Li , Isaac J Michaud , Ayan Biswas , Han-Wei Shen

Co-Hub Node Based Multiview Graph Learning with Theoretical Guarantees

Identifying the graphical structure underlying the observed multivariate data is essential in numerous applications. Current methodologies are predominantly confined to deducing a singular graph under the presumption that the observed data…

机器学习 · 统计学 2025-12-16 Bisakh Banerjee , Mohammad Alwardat , Tapabrata Maiti , Selin Aviyente

Towards a pretrained deep learning estimator of the Linfoot informational correlation

We develop a supervised deep-learning approach to estimate mutual information between two continuous random variables. As labels, we use the Linfoot informational correlation, a transformation of mutual information that has many important…

机器学习 · 统计学 2025-12-16 Stéphanie M. van den Berg , Ulrich Halekoh , Sören Möller , Andreas Kryger Jensen , Jacob von Bornemann Hjelmborg

Hellinger loss function for Generative Adversarial Networks

We propose Hellinger-type loss functions for training Generative Adversarial Networks (GANs), motivated by the boundedness, symmetry, and robustness properties of the Hellinger distance. We define an adversarial objective based on this…

机器学习 · 统计学 2025-12-16 Giovanni Saraceno , Anand N. Vidyashankar , Claudio Agostinelli

Interval Fisher's Discriminant Analysis and Visualisation

In Data Science, entities are typically represented by single valued measurements. Symbolic Data Analysis extends this framework to more complex structures, such as intervals and histograms, that express internal variability. We propose an…

机器学习 · 统计学 2025-12-16 Diogo Pinheiro , M. Rosário Oliveira , Igor Kravchenko , Lina Oliveira

Optimal Convergence Analysis of DDPM for General Distributions

Score-based diffusion models have achieved remarkable empirical success in generating high-quality samples from target data distributions. Among them, the Denoising Diffusion Probabilistic Model (DDPM) is one of the most widely used…

机器学习 · 统计学 2025-12-16 Yuchen Jiao , Yuchen Zhou , Gen Li

A PyTorch Framework for Scalable Non-Crossing Quantile Regression

Quantile regression is fundamental to distributional modeling, yet independent estimation of multiple quantiles frequently produces crossing -- where estimated quantile functions violate monotonicity, implying impossible negative…

机器学习 · 统计学 2025-12-16 Kaihua Chang

Debiasing Machine Learning Predictions for Causal Inference Without Additional Ground Truth Data: "One Map, Many Trials" in Satellite-Driven Poverty Analysis

Machine learning models trained on Earth observation data, such as satellite imagery, have demonstrated significant promise in predicting household-level wealth indices, enabling the creation of high-resolution wealth maps that can be…

机器学习 · 统计学 2025-12-16 Markus B. Pettersson , Connor T. Jerzak , Adel Daoud

Extreme mass distributions for quasi-copulas

A recent survey, nicknamed "Hitchhiker's Guide", J.J. Arias-Garc{\i}a, R. Mesiar, and B. De Baets, A hitchhiker's guide to quasi-copulas, Fuzzy Sets and Systems 393 (2020) 1-28, has raised the rating of quasi-copula problems in the…

机器学习 · 统计学 2025-12-16 Matjaž Omladič , Martin Vuk , Aljaž Zalar

Credal Prediction based on Relative Likelihood

Predictions in the form of sets of probability distributions, so-called credal sets, provide a suitable means to represent a learner's epistemic uncertainty. In this paper, we propose a theoretically grounded approach to credal prediction…

机器学习 · 统计学 2025-12-16 Timo Löhr , Paul Hofman , Felix Mohr , Eyke Hüllermeier

CRPS-Based Targeted Sequential Design with Application in Chemical Space

Sequential design of real and computer experiments via Gaussian Process (GP) models has proven useful for parsimonious, goal-oriented data acquisition purposes. In this work, we focus on acquisition strategies for a GP model that needs to…

机器学习 · 统计学 2025-12-16 Lea Friedli , Athénaïs Gautier , Anna Broccard , David Ginsbourger

Learning and Computation of $\Phi$-Equilibria at the Frontier of Tractability

$\Phi$-equilibria -- and the associated notion of $\Phi$-regret -- are a powerful and flexible framework at the heart of online learning and game theory, whereby enriching the set of deviations $\Phi$ begets stronger notions of rationality.…

机器学习 · 统计学 2025-12-16 Brian Hu Zhang , Ioannis Anagnostides , Emanuel Tewolde , Ratip Emin Berker , Gabriele Farina , Vincent Conitzer , Tuomas Sandholm

Multi-View Oriented GPLVM: Expressiveness and Efficiency

The multi-view Gaussian process latent variable model (MV-GPLVM) aims to learn a unified representation from multi-view data but is hindered by challenges such as limited kernel expressiveness and low computational efficiency. To overcome…

机器学习 · 统计学 2025-12-16 Zi Yang , Ying Li , Zhidi Lin , Michael Minyi Zhang , Pablo M. Olmos

Self-test loss functions for learning weak-form operators and gradient flows

The construction of loss functions presents a major challenge in data-driven modeling involving weak-form operators in PDEs and gradient flows, particularly due to the need to select test functions appropriately. We address this challenge…

机器学习 · 统计学 2025-12-16 Yuan Gao , Quanjun Lang , Fei Lu

Markov Chain Gradient Descent in Hilbert Spaces

In this paper, we study a Markov chain-based stochastic gradient algorithm in general Hilbert spaces, aiming at approximating the optimal solution of a quadratic loss function. We establish probabilistic upper bounds on its convergence. We…

机器学习 · 统计学 2025-12-16 Priyanka Roy , Susanne Saminger-Platz

Conditional Coverage Diagnostics for Conformal Prediction

Evaluating conditional coverage remains one of the most persistent challenges in assessing the reliability of predictive systems. Although conformal methods can give guarantees on marginal coverage, no method can guarantee to produce sets…

机器学习 · 统计学 2025-12-15 Sacha Braun , David Holzmüller , Michael I. Jordan , Francis Bach