机器学习 — Scifaro

Pairwise Comparisons without Stochastic Transitivity: Model, Theory and Applications

Most statistical models for pairwise comparisons, including the Bradley-Terry (BT) and Thurstone models and many extensions, make a relatively strong assumption of stochastic transitivity. This assumption imposes the existence of an…

机器学习 · 统计学 2026-03-12 Sze Ming Lee , Yunxiao Chen

Conditional Local Importance by Quantile Expectations

Global variable importance measures are commonly used to interpret the results of machine learning models. Local variable importance techniques assess how variables contribute to individual observations. Current, popular methods, including…

机器学习 · 统计学 2026-03-12 Kelvyn K. Bladen , Adele Cutler , D. Richard Cutler , Kevin R. Moon

Losing dimensions: Geometric memorization in generative diffusion

Diffusion models power leading generative AI, but when and how they memorize training data, especially on low-dimensional manifolds, remains unclear. We find memorization emerges gradually, not abruptly: as data become scarce, diffusion…

机器学习 · 统计学 2026-03-12 Beatrice Achilli , Enrico Ventura , Gianluigi Silvestri , Bao Pham , Gabriel Raya , Dmitry Krotov , Carlo Lucibello , Luca Ambrogioni

a-TMFG: Scalable Triangulated Maximally Filtered Graphs via Approximate Nearest Neighbors

The traditional Triangular Maximally Filtered Graph (TMFG) construction requires pre-computation and storage of a dense correlation matrix; this limits its applicability to small and medium-sized datasets. Here we identify key memory and…

机器学习 · 统计学 2026-03-11 Lionel Yelibi

What Do We Care About in Bandits with Noncompliance? BRACE: Bandits with Recommendations, Abstention, and Certified Effects

Bandits with noncompliance separate the learner's recommendation from the treatment actually delivered, so the learning target itself must be chosen. A platform may care about recommendation welfare in the current mediated workflow,…

机器学习 · 统计学 2026-03-11 Nicolás Della Penna

On Regret Bounds of Thompson Sampling for Bayesian Optimization

We study a widely used Bayesian optimization method, Gaussian process Thompson sampling (GP-TS), under the assumption that the objective function is a sample path from a GP. Compared with the GP upper confidence bound (GP-UCB) with…

机器学习 · 统计学 2026-03-11 Shion Takeno , Shogo Iwazaki

A Generative Sampler for distributions with possible discrete parameter based on Reversibility

Learning to sample from complex unnormalized distributions is a fundamental challenge in computational physics and machine learning. While score-based and variational methods have achieved success in continuous domains, extending them to…

机器学习 · 统计学 2026-03-11 Lei Li , Zhen Wang , Lishuo Zhang

Verifying Good Regulator Conditions for Hypergraph Observers: Natural Gradient Learning from Causal Invariance via Established Theorems

We verify that persistent observers in causally invariant hypergraph substrates satisfy the conditions of the Conant-Ashby Good Regulator Theorem. Building on Wolfram's hypergraph physics and Vanchurin's neural network cosmology, we…

机器学习 · 统计学 2026-03-11 Max Zhuravlev

Statistical Inference via Generative Models: Flow Matching and Causal Inference

Generative AI has achieved remarkable empirical success, but from the perspective of statistics it often remains opaque: its predictions may be accurate, yet the underlying mechanism is difficult to interpret, analyze, and trust. This book…

机器学习 · 统计学 2026-03-11 Shinto Eguchi

Towards Reliable Simulation-based Inference

Scientific knowledge expands by observing the world, hypothesizing some theories about it, and testing them against collected data. When those theories take the form of statistical models, statistical analyses are involved in the process of…

机器学习 · 统计学 2026-03-11 Arnaud Delaunoy

Permutation-Equivariant 2D State Space Models: Theory and Canonical Architecture for Multivariate Time Series

Multivariate time series (MTS) modeling often implicitly imposes an artificial ordering over variables, violating the inherent exchangeability found in many real-world systems where no canonical variable axis exists. We formalize this…

机器学习 · 统计学 2026-03-11 Seungwoo Jeong , Heung-Il Suk

Robust Assortment Optimization from Observational Data

Assortment optimization is a fundamental challenge in modern retail and recommendation systems, where the goal is to select a subset of products that maximizes expected revenue under complex customer choice behaviors. While recent advances…

机器学习 · 统计学 2026-03-11 Miao Lu , Yuxuan Han , Han Zhong , Zhengyuan Zhou , Jose Blanchet

An AI-powered Bayesian Generative Modeling Approach for Arbitrary Conditional Inference

Modern data analysis increasingly requires flexible conditional inference P(X_B | X_A) where (X_A, X_B) is an arbitrary partition of observed variable X. Existing approaches are either restricted to a fixed conditioning structure or depend…

机器学习 · 统计学 2026-03-11 Qiao Liu , Wing Hung Wong

Personalized Collaborative Learning with Affinity-Based Variance Reduction

Multi-agent learning faces a fundamental tension: leveraging distributed collaboration without sacrificing the personalization needed for diverse agents. This tension intensifies when aiming for full personalization while adapting to…

机器学习 · 统计学 2026-03-11 Chenyu Zhang , Navid Azizan

Repulsive Monte Carlo on the sphere for the sliced Wasserstein distance

In this paper, we consider the problem of computing the integral of a function on the unit sphere, in any dimension, using Monte Carlo methods. Although the methods we present are general, our guiding thread is the sliced Wasserstein…

机器学习 · 统计学 2026-03-11 Vladimir Petrovic , Rémi Bardenet , Agnès Desolneux

Global Convergence of Iteratively Reweighted Least Squares for Robust Subspace Recovery

Robust subspace estimation is fundamental to many machine learning and data analysis tasks. Iteratively Reweighted Least Squares (IRLS) is an elegant and empirically effective approach to this problem, yet its theoretical properties remain…

机器学习 · 统计学 2026-03-11 Gilad Lerman , Kang Li , Tyler Maunu , Teng Zhang

Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning

Motivated by real-world settings where data collection and policy deployment -- whether for a single agent or across multiple agents -- are costly, we study the problem of on-policy single-agent reinforcement learning (RL) and federated RL…

机器学习 · 统计学 2026-03-11 Haochen Zhang , Zhong Zheng , Lingzhou Xue

Momentum SVGD-EM for Accelerated Maximum Marginal Likelihood Estimation

Maximum marginal likelihood estimation (MMLE) can be formulated as the optimization of a free energy functional. From this viewpoint, the Expectation-Maximisation (EM) algorithm admits a natural interpretation as a coordinate descent method…

机器学习 · 统计学 2026-03-10 Adam Rozzio , Rafael Athanasiades , O. Deniz Akyildiz

Generative Adversarial Regression (GAR): Learning Conditional Risk Scenarios

We propose Generative Adversarial Regression (GAR), a framework for learning conditional risk scenarios through generators aligned with downstream risk objectives. GAR builds on a regression characterization of conditional risk for…

机器学习 · 统计学 2026-03-10 Saeed Asadi , Jonathan Yu-Meng Li

Unifying On- and Off-Policy Variance Reduction Methods

Continuous and efficient experimentation is key to the practical success of user-facing applications on the web, both through online A/B-tests and off-policy evaluation. Despite their shared objective -- estimating the incremental value of…

机器学习 · 统计学 2026-03-10 Olivier Jeunen