机器学习 — Scifaro

Infinite-dimensional Mahalanobis Distance with Applications to Kernelized Novelty Detection

The Mahalanobis distance is a classical tool used to measure the covariance-adjusted distance between points in $\bbR^d$. In this work, we extend the concept of Mahalanobis distance to separable Banach spaces by reinterpreting it as a…

机器学习 · 统计学 2025-10-31 Nikita Zozoulenko , Thomas Cass , Lukas Gonon

Random pairing MLE for estimation of item parameters in Rasch model

The Rasch model, a classical model in the item response theory, is widely used in psychometrics to model the relationship between individuals' latent traits and their binary responses to assessments or questionnaires. In this paper, we…

机器学习 · 统计学 2025-10-31 Yuepeng Yang , Cong Ma

Monitoring the calibration of probability forecasts with an application to concept drift detection involving image classification

Machine learning approaches for image classification have led to impressive advances in that field. For example, convolutional neural networks are able to achieve remarkable image classification accuracy across a wide range of applications…

机器学习 · 统计学 2025-10-30 Christopher T. Franck , Anne R. Driscoll , Zoe Szajnfarber , William H. Woodall

Tree Ensemble Explainability through the Hoeffding Functional Decomposition and TreeHFD Algorithm

Tree ensembles have demonstrated state-of-the-art predictive performance across a wide range of problems involving tabular data. Nevertheless, the black-box nature of tree ensembles is a strong limitation, especially for applications with…

机器学习 · 统计学 2025-10-30 Clément Bénard

Certainty in Uncertainty: Reasoning over Uncertain Knowledge Graphs with Statistical Guarantees

Uncertain knowledge graph embedding (UnKGE) methods learn vector representations that capture both structural and uncertainty information to predict scores of unseen triples. However, existing methods produce only point estimates, without…

机器学习 · 统计学 2025-10-30 Yuqicheng Zhu , Jingcheng Wu , Yizhen Wang , Hongkuan Zhou , Jiaoyan Chen , Evgeny Kharlamov , Steffen Staab

Symplectic Generative Networks (SGNs): A Hamiltonian Framework for Invertible Deep Generative Modeling

We introduce the \emph{Symplectic Generative Network (SGN)}, a deep generative model that leverages Hamiltonian mechanics to construct an invertible, volume-preserving mapping between a latent space and the data space. By endowing the…

机器学习 · 统计学 2025-10-30 Agnideep Aich , Ashit Aich

Continuous Domain Generalization

Real-world data distributions often shift continuously across multiple latent factors such as time, geography, and socioeconomic contexts. However, existing domain generalization approaches typically treat domains as discrete or as evolving…

机器学习 · 统计学 2025-10-30 Zekun Cai , Yiheng Yao , Guangji Bai , Renhe Jiang , Xuan Song , Ryosuke Shibasaki , Liang Zhao

The Neural Pruning Law Hypothesis

Network pruning is used to reduce inference latency and power consumption in large neural networks. However, most current pruning methods rely on ad-hoc heuristics that are poorly understood. We introduce Hyperflux, a conceptually-grounded…

机器学习 · 统计学 2025-10-30 Eugen Barbulescu , Antonio Alexoaie , Lucian Busoniu

Dynamical Decoupling of Generalization and Overfitting in Large Two-Layer Networks

Understanding the inductive bias and generalization properties of large overparametrized machine learning models requires to characterize the dynamics of the training algorithm. We study the learning dynamics of large two-layer neural…

机器学习 · 统计学 2025-10-30 Andrea Montanari , Pierfrancesco Urbani

Tracking the Median of Gradients with a Stochastic Proximal Point Method

There are several applications of stochastic optimization where one can benefit from a robust estimate of the gradient. For example, domains such as distributed learning with corrupted nodes, the presence of large outliers in the training…

机器学习 · 统计学 2025-10-30 Fabian Schaipp , Guillaume Garrigos , Umut Simsekli , Robert Gower

Transfer Learning for Kernel-based Regression

In recent years, transfer learning has garnered significant attention. Its ability to leverage knowledge from related studies to improve generalization performance in a target study has made it highly appealing. This paper focuses on…

机器学习 · 统计学 2025-10-30 Chao Wang , Caixing Wang , Xin He , Xingdong Feng

Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous Unobserved Confounders

Offline reinforcement learning is important in domains such as medicine, economics, and e-commerce where online experimentation is costly, dangerous or unethical, and where the true model is unknown. However, most methods assume all…

机器学习 · 统计学 2025-10-30 David Bruns-Smith , Angela Zhou

Score-based constrained generative modeling via Langevin diffusions with boundary conditions

Score-based generative models based on stochastic differential equations (SDEs) achieve impressive performance in sampling from unknown distributions, but often fail to satisfy underlying constraints. We propose a constrained generative…

机器学习 · 统计学 2025-10-29 Adam Nordenhög , Akash Sharma

VIKING: Deep variational inference with stochastic projections

Variational mean field approximations tend to struggle with contemporary overparametrized deep neural networks. Where a Bayesian treatment is usually associated with high-quality predictions and uncertainties, the practical reality has been…

机器学习 · 统计学 2025-10-29 Samuel G. Fadel , Hrittik Roy , Nicholas Krämer , Yevgen Zainchkovskyy , Stas Syrota , Alejandro Valverde Mahou , Carl Henrik Ek , Søren Hauberg

Beyond Normality: Reliable A/B Testing with Non-Gaussian Data

A/B testing has become the cornerstone of decision-making in online markets, guiding how platforms launch new features, optimize pricing strategies, and improve user experience. In practice, we typically employ the pairwise $t$-test to…

机器学习 · 统计学 2025-10-29 Junpeng Gong , Chunkai Wang , Hao Li , Jinyong Ma , Haoxuan Li , Xu He

Doubly-Robust Estimation of Counterfactual Policy Mean Embeddings

Estimating the distribution of outcomes under counterfactual policies is critical for decision-making in domains such as recommendation, advertising, and healthcare. We propose and analyze a novel framework-Counterfactual Policy Mean…

机器学习 · 统计学 2025-10-29 Houssam Zenati , Bariscan Bozkurt , Arthur Gretton

Linear regression with overparameterized linear neural networks: Tight upper and lower bounds for implicit $\ell^1$-regularization

Modern machine learning models are often trained in a setting where the number of parameters exceeds the number of training samples. To understand the implicit bias of gradient descent in such overparameterized models, prior work has…

机器学习 · 统计学 2025-10-29 Hannes Matt , Dominik Stöger

Attention-based clustering

Transformers have emerged as a powerful neural network architecture capable of tackling a wide range of learning tasks. In this work, we provide a theoretical analysis of their ability to automatically extract structure from data in an…

机器学习 · 统计学 2025-10-29 Rodrigo Maulen-Soto , Pierre Marion , Claire Boyer

Tighter CMI-Based Generalization Bounds via Stochastic Projection and Quantization

In this paper, we leverage stochastic projection and lossy compression to establish new conditional mutual information (CMI) bounds on the generalization error of statistical learning algorithms. It is shown that these bounds are generally…

机器学习 · 统计学 2025-10-28 Milad Sefidgaran , Kimia Nadjahi , Abdellatif Zaidi

Robust Decision Making with Partially Calibrated Forecasts

Calibration has emerged as a foundational goal in ``trustworthy machine learning'', in part because of its strong decision theoretic semantics. Independent of the underlying distribution, and independent of the decision maker's utility…

机器学习 · 统计学 2025-10-28 Shayan Kiyani , Hamed Hassani , George Pappas , Aaron Roth