机器学习 — Scifaro

Perturbations in the Orthogonal Complement Subspace for Efficient Out-of-Distribution Detection

Out-of-distribution (OOD) detection is essential for deploying deep learning models in open-world environments. Existing approaches, such as energy-based scoring and gradient-projection methods, typically rely on high-dimensional…

机器学习 · 统计学 2025-11-04 Zhexiao Huang , Weihao He , Shutao Deng , Junzhe Chen , Chao Yuan , Hongxin Wang , Changsheng Zhou

SOCRATES: Simulation Optimization with Correlated Replicas and Adaptive Trajectory Evaluations

The field of simulation optimization (SO) encompasses various methods developed to optimize complex, expensive-to-sample stochastic systems. Established methods include, but are not limited to, ranking-and-selection for finite alternatives…

机器学习 · 统计学 2025-11-04 Haoting Zhang , Haoxian Chen , Donglin Zhan , Hanyang Zhao , Henry Lam , Wenpin Tang , David Yao , Zeyu Zheng

Accuracy estimation of neural networks by extreme value theory

Neural networks are able to approximate any continuous function on a compact set. However, it is not obvious how to quantify the error of the neural network, i.e., the remaining bias between the function and the neural network. Here, we…

机器学习 · 统计学 2025-11-04 Gero Junike , Marco Oesting

A Streaming Sparse Cholesky Method for Derivative-Informed Gaussian Process Surrogates Within Digital Twin Applications

Digital twins are developed to model the behavior of a specific physical asset (or twin), and they can consist of high-fidelity physics-based models or surrogates. A highly accurate surrogate is often preferred over multi-physics models as…

机器学习 · 统计学 2025-11-04 Krishna Prasath Logakannan , Shridhar Vashishtha , Jacob Hochhalter , Shandian Zhe , Robert M. Kirby

Gradient Boosted Mixed Models: Flexible Joint Estimation of Mean and Variance Components for Clustered Data

Linear mixed models are widely used for clustered data, but their reliance on parametric forms limits flexibility in complex and high-dimensional settings. In contrast, gradient boosting methods achieve high predictive accuracy through…

机器学习 · 统计学 2025-11-04 Mitchell L. Prevett , Francis K. C. Hui , Zhi Yang Tho , A. H. Welsh , Anton H. Westveld

MMbeddings: Parameter-Efficient, Low-Overfitting Probabilistic Embeddings Inspired by Nonlinear Mixed Models

We present MMbeddings, a probabilistic embedding approach that reinterprets categorical embeddings through the lens of nonlinear mixed models, effectively bridging classical statistical theory with modern deep learning. By treating…

机器学习 · 统计学 2025-11-04 Giora Simchoni , Saharon Rosset

Evaluation and Optimization of Leave-one-out Cross-validation for the Lasso

I develop an algorithm to produce the piecewise quadratic that computes leave-one-out cross-validation for the lasso as a function of its hyperparameter. The algorithm can be used to find exact hyperparameters that optimize leave-one-out…

机器学习 · 统计学 2025-11-04 Ryan Burn

Schr\"odinger Bridge Matching for Tree-Structured Costs and Entropic Wasserstein Barycentres

Recent advances in flow-based generative modelling have provided scalable methods for computing the Schr\"odinger Bridge (SB) between distributions, a dynamic form of entropy-regularised Optimal Transport (OT) for the quadratic cost. The…

机器学习 · 统计学 2025-11-04 Samuel Howard , Peter Potaptchik , George Deligiannidis

Large Stepsizes Accelerate Gradient Descent for Regularized Logistic Regression

We study gradient descent (GD) with a constant stepsize for $\ell_2$-regularized logistic regression with linearly separable data. Classical theory suggests small stepsizes to ensure monotonic reduction of the optimization objective,…

机器学习 · 统计学 2025-11-04 Jingfeng Wu , Pierre Marion , Peter Bartlett

Characterization and Learning of Causal Graphs from Hard Interventions

A fundamental challenge in the empirical sciences involves uncovering causal structure through observation and experimentation. Causal discovery entails linking the conditional independence (CI) invariances in observational data to their…

机器学习 · 统计学 2025-11-04 Zihan Zhou , Muhammad Qasim Elahi , Murat Kocaoglu

Behavior of prediction performance metrics with rare events

Objective: Area under the receiving operator characteristic curve (AUC) is commonly reported alongside prediction models for binary outcomes. Recent articles have raised concerns that AUC might be a misleading measure of prediction…

机器学习 · 统计学 2025-11-04 Emily Minus , R. Yates Coley , Susan M. Shortreed , Brian D. Williamson

Differential privacy guarantees of Markov chain Monte Carlo algorithms

This paper aims to provide differential privacy (DP) guarantees for Markov chain Monte Carlo (MCMC) algorithms. In a first part, we establish DP guarantees on samples output by MCMC algorithms as well as Monte Carlo estimators associated…

机器学习 · 统计学 2025-11-04 Andrea Bertazzi , Tim Johnston , Gareth O. Roberts , Alain Durmus

Wait-Less Offline Tuning and Re-solving for Online Decision Making

Online linear programming (OLP) has found broad applications in revenue management and resource allocation. State-of-the-art OLP algorithms achieve low regret by repeatedly solving linear programming (LP) subproblems that incorporate…

机器学习 · 统计学 2025-11-04 Jingruo Sun , Wenzhi Gao , Ellen Vitercik , Yinyu Ye

Double Descent Meets Out-of-Distribution Detection: Theoretical Insights and Empirical Analysis on the role of model complexity

Out-of-distribution (OOD) detection is essential for ensuring the reliability and safety of machine learning systems. In recent years, it has received increasing attention, particularly through post-hoc detection and training-based methods.…

机器学习 · 统计学 2025-11-04 Mouïn Ben Ammar , David Brellmann , Arturo Mendoza , Antoine Manzanera , Gianni Franchi

Variational Inference in Location-Scale Families: Exact Recovery of the Mean and Correlation Matrix

Given an intractable target density $p$, variational inference (VI) attempts to find the best approximation $q$ from a tractable family $Q$. This is typically done by minimizing the exclusive Kullback-Leibler divergence, $\text{KL}(q||p)$.…

机器学习 · 统计学 2025-11-04 Charles C. Margossian , Lawrence K. Saul

Bayesian Additive Main Effects and Multiplicative Interaction Models using Tensor Regression for Multi-environmental Trials

We propose a Bayesian tensor regression model to accommodate the effect of multiple factors on phenotype prediction. We adopt a set of prior distributions that resolve identifiability issues that may arise between the parameters in the…

机器学习 · 统计学 2025-11-04 Antonia A. L. Dos Santos , Danilo A. Sarti , Rafael A. Moral , Andrew C. Parnell

Minimax-Optimal Two-Sample Test with Sliced Wasserstein

We study the problem of nonparametric two-sample testing using the sliced Wasserstein (SW) distance. While prior theoretical and empirical work indicates that the SW distance offers a promising balance between strong statistical guarantees…

机器学习 · 统计学 2025-11-03 Binh Thuan Tran , Nicolas Schreuder

Interpretable Model-Aware Counterfactual Explanations for Random Forest

Despite their enormous predictive power, machine learning models are often unsuitable for applications in regulated industries such as finance, due to their limited capacity to provide explanations. While model-agnostic frameworks such as…

机器学习 · 统计学 2025-11-03 Joshua S. Harvey , Guanchao Feng , Sai Anusha Meesala , Tina Zhao , Dhagash Mehta

On the Equivalence of Optimal Transport Problem and Action Matching with Optimal Vector Fields

Flow Matching (FM) method in generative modeling maps arbitrary probability distributions by constructing an interpolation between them and then learning the vector field that defines ODE for this interpolation. Recently, it was shown that…

机器学习 · 统计学 2025-11-03 Nikita Kornilov , Alexander Korotin

Decreasing Entropic Regularization Averaged Gradient for Semi-Discrete Optimal Transport

Adding entropic regularization to Optimal Transport (OT) problems has become a standard approach for designing efficient and scalable solvers. However, regularization introduces a bias from the true solution. To mitigate this bias while…

机器学习 · 统计学 2025-11-03 Ferdinand Genans , Antoine Godichon-Baggioni , François-Xavier Vialard , Olivier Wintenberger