机器学习 — Scifaro

Dropping Just a Handful of Preferences Can Change Top Large Language Model Rankings

We propose a method for evaluating the robustness of widely used LLM ranking systems -- variants of a Bradley--Terry model -- to dropping a worst-case very small fraction of preference data. Our approach is computationally fast and easy to…

机器学习 · 统计学 2026-03-06 Jenny Y. Huang , Yunyi Shen , Dennis Wei , Tamara Broderick

Variational Formulation of Particle Flow

This paper provides a formulation of the log-homotopy particle flow from the perspective of variational inference. We show that the transient density used to derive the particle flow follows a time-scaled trajectory of the Fisher-Rao…

机器学习 · 统计学 2026-03-06 Yinzhuang Yi , Jorge Cortés , Nikolay Atanasov

Generalization Bounds for Markov Algorithms through Entropy Flow Computations

Many learning algorithms can be represented as Markov processes, and understanding their generalization error is a central topic in learning theory. For specific continuous-time noisy algorithms, a prominent analysis technique relies on…

机器学习 · 统计学 2026-03-06 Benjamin Dupuis , Maxime Haddouche , George Deligiannidis , Umut Simsekli

Semi-Supervised Generative Learning via Latent Space Distribution Matching

We introduce Latent Space Distribution Matching (LSDM), a novel framework for semi-supervised generative modeling of conditional distributions. LSDM operates in two stages: (i) learning a low-dimensional latent space from both paired and…

机器学习 · 统计学 2026-03-05 Kwong Yu Chong , Long Feng

Beyond Mixtures and Products for Ensemble Aggregation: A Likelihood Perspective on Generalized Means

Density aggregation is a central problem in machine learning, for instance when combining predictions from a Deep Ensemble. The choice of aggregation remains an open question with two commonly proposed approaches being linear pooling…

机器学习 · 统计学 2026-03-05 Raphaël Razafindralambo , Rémy Sun , Frédéric Precioso , Damien Garreau , Pierre-Alexandre Mattei

Stable and Steerable Sparse Autoencoders with Weight Regularization

Sparse autoencoders (SAEs) are widely used to extract human-interpretable features from neural network activations, but their learned features can vary substantially across random seeds and training choices. To improve stability, we studied…

机器学习 · 统计学 2026-03-05 Piotr Jedryszek , Oliver M. Crook

Exploiting Subgradient Sparsity in Max-Plus Neural Networks

Deep Neural Networks are powerful tools for solving machine learning problems, but their training often involves dense and costly parameter updates. In this work, we use a novel Max-Plus neural architecture in which classical addition and…

机器学习 · 统计学 2026-03-05 Ikhlas Enaieh , Olivier Fercoq

Invariance-Based Dynamic Regret Minimization

We consider stochastic non-stationary linear bandits where the linear parameter connecting contexts to the reward changes over time. Existing algorithms in this setting localize the policy by gradually discarding or down-weighting past…

机器学习 · 统计学 2026-03-05 Margherita Lazzaretto , Jonas Peters , Niklas Pfister

Observationally Informed Adaptive Causal Experimental Design

Randomized Controlled Trials (RCTs) represent the gold standard for causal inference yet remain a scarce resource. While large-scale observational data is often available, it is utilized only for retrospective fusion, and remains discarded…

机器学习 · 统计学 2026-03-05 Erdun Gao , Liang Zhang , Jake Fawkes , Aoqi Zuo , Wenqin Liu , Haoxuan Li , Mingming Gong , Dino Sejdinovic

Riemannian Langevin Dynamics: Strong Convergence of Geometric Euler-Maruyama Scheme

Low-dimensional structure in real-world data plays an important role in the success of generative models, which motivates diffusion models defined on intrinsic data manifolds. Such models are driven by stochastic differential equations…

机器学习 · 统计学 2026-03-05 Zhiyuan Zhan , Masashi Sugiyama

Empirical Evaluation of No Free Lunch Violations in Permutation-Based Optimization

The No Free Lunch (NFL) theorem guarantees equal average performance only under uniform sampling of a function space closed under permutation (c.u.p.). We ask when this averaging ceases to reflect what benchmarking actually reports. We…

机器学习 · 统计学 2026-03-05 Grzegorz Sroka

Scalable Contrastive Causal Discovery under Unknown Soft Interventions

Observational causal discovery is only identifiable up to the Markov equivalence class. While interventions can reduce this ambiguity, in practice interventions are often soft with multiple unknown targets. In many realistic scenarios, only…

机器学习 · 统计学 2026-03-05 Mingxuan Zhang , Khushi Desai , Sopho Kevlishvili , Elham Azizi

Surprisal-R\'enyi Free Energy

The forward and reverse Kullback-Leibler (KL) divergences arise as limiting objectives in learning and inference yet induce markedly different inductive biases that cannot be explained at the level of expectations alone. In this work, we…

机器学习 · 统计学 2026-03-05 Shion Matsumoto , Raul Castillo , Benjamin Prada , Ankur Arjun Mali

Beyond Cross-Validation: Adaptive Parameter Selection for Kernel-Based Gradient Descents

This paper proposes a novel parameter selection strategy for kernel-based gradient descent (KGD) algorithms, integrating bias-variance analysis with the splitting method. We introduce the concept of empirical effective dimension to quantify…

机器学习 · 统计学 2026-03-05 Xiaotong Liu , Yunwen Lei , Xiangyu Chang , Shao-Bo Lin

Learning Order Forest for Qualitative-Attribute Data Clustering

Clustering is a fundamental approach to understanding data patterns, wherein the intuitive Euclidean distance space is commonly adopted. However, this is not the case for implicit cluster distributions reflected by qualitative attribute…

机器学习 · 统计学 2026-03-05 Mingjie Zhao , Sen Feng , Yiqun Zhang , Mengke Li , Yang Lu , Yiu-ming Cheung

The Theory behind UMAP?

In 2018, McInnes et al. introduced a dimensionality reduction algorithm called UMAP, which enjoys wide popularity among data scientists. Their work introduces a finite variant of a functor called the metric realization, based on an…

机器学习 · 统计学 2026-03-05 David Wegmann

Implicit Bias of the JKO Scheme

Wasserstein gradient flow provides a general framework for minimizing an energy functional $J$ over the space of probability measures on a Riemannian manifold $(M,g)$. Its canonical time-discretization, the Jordan-Kinderlehrer-Otto (JKO)…

机器学习 · 统计学 2026-03-05 Peter Halmos , Boris Hanin

Best-of-$\infty$ -- Asymptotic Performance of Test-Time LLM Ensembling

We study best-of-$N$ for large language models (LLMs) where the selection is based on majority voting. In particular, we analyze the limit $N \to \infty$, which we denote as \boinflower. While this approach achieves impressive performance…

机器学习 · 统计学 2026-03-05 Junpei Komiyama , Daisuke Oba , Masafumi Oyamada

Finite-Dimensional Gaussian Approximation for Deep Neural Networks: Universality in Random Weights

We study the Finite-Dimensional Distributions (FDDs) of deep neural networks with randomly initialized weights that have finite-order moments. Specifically, we establish Gaussian approximation bounds in the Wasserstein-$1$ norm between the…

机器学习 · 统计学 2026-03-05 Krishnakumar Balasubramanian , Nathan Ross

Boosting In-Context Learning in LLMs Through the Lens of Classical Supervised Learning

In-Context Learning (ICL) allows Large Language Models (LLMs) to adapt to new tasks with just a few examples, but their predictions often suffer from systematic biases, leading to unstable performance in classification. While calibration…

机器学习 · 统计学 2026-03-05 Korel Gundem , Juncheng Dong , Dennis Zhang , Vahid Tarokh , Zhengling Qi