机器学习 — Scifaro

A Copula Based Supervised Filter for Feature Selection in Diabetes Risk Prediction Using Machine Learning

Effective feature selection is critical for robust and interpretable predictive modeling in medicine, especially when risk factors matter most in extreme patient strata. Many standard selectors emphasize average associations and can miss…

机器学习 · 统计学 2026-03-05 Agnideep Aich , Md Monzur Murshed , Sameera Hewage , Amanda Mayeaux

Low-Rank Contextual Reinforcement Learning from Heterogeneous Human Feedback

Reinforcement learning from human feedback (RLHF) has become a cornerstone for aligning large language models with human preferences. However, the heterogeneity of human feedback, driven by diverse individual contexts and preferences, poses…

机器学习 · 统计学 2026-03-05 Seong Jin Lee , Will Wei Sun , Yufeng Liu

Sample-Optimal Locally Private Hypothesis Selection and the Provable Benefits of Interactivity

We study the problem of hypothesis selection under the constraint of local differential privacy. Given a class $\mathcal{F}$ of $k$ distributions and a set of i.i.d. samples from an unknown distribution $h$, the goal of hypothesis selection…

机器学习 · 统计学 2026-03-05 Alireza F. Pour , Hassan Ashtiani , Shahab Asoodeh

A Covering Framework for Offline POMDPs Learning using Belief Space Metric

In off policy evaluation (OPE) for partially observable Markov decision processes (POMDPs), an agent must infer hidden states from past observations, which exacerbates both the curse of horizon and the curse of memory in existing OPE…

机器学习 · 统计学 2026-03-04 Youheng Zhu , Yiping Lu

Generalized Bayes for Causal Inference

Uncertainty quantification is central to many applications of causal machine learning, yet principled Bayesian inference for causal effects remains challenging. Standard Bayesian approaches typically require specifying a probabilistic model…

机器学习 · 统计学 2026-03-04 Emil Javurek , Dennis Frauen , Yuxin Wang , Stefan Feuerriegel

Exact Functional ANOVA Decomposition for Categorical Inputs Models

Functional ANOVA offers a principled framework for interpretability by decomposing a model's prediction into main effects and higher-order interactions. For independent features, this decomposition is well-defined, strongly linked with SHAP…

机器学习 · 统计学 2026-03-04 Baptiste Ferrere , Nicolas Bousquet , Fabrice Gamboa , Jean-Michel Loubes , Joseph Muré

Combinatorial Sparse PCA Beyond the Spiked Identity Model

Sparse PCA is one of the most well-studied problems in high-dimensional statistics. In this problem, we are given samples from a distribution with covariance $\Sigma$, whose top eigenvector $v \in R^d$ is $s$-sparse. Existing sparse PCA…

机器学习 · 统计学 2026-03-04 Syamantak Kumar , Purnamrita Sarkar , Kevin Tian , Peiyuan Zhang

Low-Degree Method Fails to Predict Robust Subspace Recovery

The low-degree polynomial framework has been highly successful in predicting computational versus statistical gaps for high-dimensional problems in average-case analysis and machine learning. This success has led to the low-degree…

机器学习 · 统计学 2026-03-04 He Jia , Aravindan Vijayaraghavan

Geometric structures and deviations on James' symmetric positive-definite matrix bicone domain

Symmetric positive-definite (SPD) matrix datasets play a central role across numerous scientific disciplines, including signal processing, statistics, finance, computer vision, information theory, and machine learning among others. The set…

机器学习 · 统计学 2026-03-04 Jacek Karwowski , Frank Nielsen

A Researcher's Guide to Empirical Risk Minimization

This guide provides a reference for high-probability regret bounds in empirical risk minimization (ERM). The presentation is modular: we begin with intuition and general proof strategies, then state broadly applicable guarantees under…

机器学习 · 统计学 2026-03-04 Lars van der Laan

Selecting Optimal Variable Order in Autoregressive Ising Models

Autoregressive models enable tractable sampling from learned probability distributions, but their performance critically depends on the variable ordering used in the factorization via complexities of the resulting conditional distributions.…

机器学习 · 统计学 2026-03-04 Shiba Biswal , Marc Vuffray , Andrey Y. Lokhov

FAST: Topology-Aware Frequency-Domain Distribution Matching for Coreset Selection

Coreset selection compresses large datasets into compact, representative subsets, reducing the energy and computational burden of training deep neural networks. Existing methods are either: (i) DNN-based, which are tied to model-specific…

机器学习 · 统计学 2026-03-04 Jin Cui , Boran Zhao , Jiajun Xu , Jiaqi Guo , Shuo Guan , Pengju Ren

Fast Estimation of Wasserstein Distances via Regression on Sliced Wasserstein Distances

We address the problem of efficiently computing Wasserstein distances for multiple pairs of distributions drawn from a meta-distribution. To this end, we propose a fast estimation method based on regressing Wasserstein distance on sliced…

机器学习 · 统计学 2026-03-04 Khai Nguyen , Hai Nguyen , Nhat Ho

CLEAR: Calibrated Learning for Epistemic and Aleatoric Risk

Accurate uncertainty quantification is critical for reliable predictive modeling. Existing methods typically address either aleatoric uncertainty due to measurement noise or epistemic uncertainty resulting from limited data, but not both in…

机器学习 · 统计学 2026-03-04 Ilia Azizi , Juraj Bodik , Jakob Heiss , Bin Yu

Robust Amortized Bayesian Inference with Self-Consistency Losses on Unlabeled Data

Amortized Bayesian inference (ABI) with neural networks can solve probabilistic inverse problems orders of magnitude faster than classical methods. However, ABI is not yet sufficiently robust for widespread and safe application. When…

机器学习 · 统计学 2026-03-04 Aayush Mishra , Daniel Habermann , Marvin Schmitt , Stefan T. Radev , Paul-Christian Bürkner

Covering Numbers for Deep ReLU Networks with Applications to Function Approximation and Nonparametric Regression

Covering numbers of (deep) ReLU networks have been used to characterize approximation-theoretic performance, to upper-bound prediction error in nonparametric regression, and to quantify classification capacity. These results rely on…

机器学习 · 统计学 2026-03-04 Weigutian Ou , Helmut Bölcskei

Proper losses regret at least 1/2-order

A fundamental challenge in machine learning is the choice of a loss as it characterizes our learning task, is minimized in the training phase, and serves as an evaluation criterion for estimators. Proper losses are commonly chosen, ensuring…

机器学习 · 统计学 2026-03-04 Han Bao , Asuka Takatsu

Importance Weighting Correction of Regularized Least-Squares for Target Shift

Importance weighting is a standard tool for correcting distribution shift, but its statistical behavior under target shift -- where the label distribution changes between training and testing while the conditional distribution of inputs…

机器学习 · 统计学 2026-03-04 Davit Gogolashvili

Instrumental and Proximal Causal Inference with Gaussian Processes

Instrumental variable (IV) and proximal causal learning (Proxy) methods are central frameworks for causal inference in the presence of unobserved confounding. Despite substantial methodological advances, existing approaches rarely provide…

机器学习 · 统计学 2026-03-03 Yuqi Zhang , Krikamol Muandet , Dino Sejdinovic , Edwin Fong , Siu Lun Chau

TRAKNN: Efficient Trajectory Aware Spatiotemporal kNN for Rare Meteorological Trajectory Detection

Extreme weather events, such as windstorms and heatwaves, are driven by persistent atmospheric circulation patterns that evolve over several consecutive days. While traditional circulation-based studies often focus on instantaneous…

机器学习 · 统计学 2026-03-03 Guillaume Coulaud , Davide Faranda