机器学习 — Scifaro

Joint auto-encoders: a flexible multi-task learning framework

The incorporation of prior knowledge into learning is essential in achieving good performance based on small noisy samples. Such knowledge is often incorporated through the availability of related data arising from domains and tasks similar…

机器学习 · 统计学 2026-02-24 Baruch Epstein , Ron Meir , Tomer Michaeli

Box Thirding: Anytime Best Arm Identification under Insufficient Sampling

We introduce Box Thirding (B3), a flexible and efficient algorithm for Best Arm Identification (BAI) under fixed-budget constraints. It is designed for both anytime BAI and scenarios with large N, where the number of arms is too large for…

机器学习 · 统计学 2026-02-23 Seohwa Hwang , Junyong Park

On the Generalization and Robustness in Conditional Value-at-Risk

Conditional Value-at-Risk (CVaR) is a widely used risk-sensitive objective for learning under rare but high-impact losses, yet its statistical behavior under heavy-tailed data remains poorly understood. Unlike expectation-based risk, CVaR…

机器学习 · 统计学 2026-02-23 Dinesh Karthik Mulumudi , Piyushi Manupriya , Gholamali Aminian , Anant Raj

Learning from Biased and Costly Data Sources: Minimax-optimal Data Collection under a Budget

Data collection is a critical component of modern statistical and machine learning pipelines, particularly when data must be gathered from multiple heterogeneous sources to study a target population of interest. In many use cases, such as…

机器学习 · 统计学 2026-02-23 Michael O. Harding , Vikas Singh , Kirthevasan Kandasamy

Interactive Learning of Single-Index Models via Stochastic Gradient Descent

Stochastic gradient descent (SGD) is a cornerstone algorithm for high-dimensional optimization, renowned for its empirical successes. Recent theoretical advances have provided a deep understanding of how SGD enables feature learning in…

机器学习 · 统计学 2026-02-23 Nived Rajaraman , Yanjun Han

Drift Estimation for Stochastic Differential Equations with Denoising Diffusion Models

We study the estimation of time-homogeneous drift functions in multivariate stochastic differential equations with known diffusion coefficient, from multiple trajectories observed at high frequency over a fixed time horizon. We formulate…

机器学习 · 统计学 2026-02-23 Marcos Tapia Costa , Nikolas Kantas , George Deligiannidis

Topological Exploration of High-Dimensional Empirical Risk Landscapes: general approach, and applications to phase retrieval

We consider the landscape of empirical risk minimization for high-dimensional Gaussian single-index models (generalized linear models). The objective is to recover an unknown signal $\boldsymbol{\theta}^\star \in \mathbb{R}^d$ (where $d \gg…

机器学习 · 统计学 2026-02-23 Antoine Maillard , Tony Bonnaire , Giulio Biroli

Simplex Deep Linear Discriminant Analysis

We revisit Deep Linear Discriminant Analysis (Deep LDA) from a likelihood-based perspective. While classical LDA is a simple Gaussian model with linear decision boundaries, attaching an LDA head to a neural encoder raises the question of…

机器学习 · 统计学 2026-02-23 Maxat Tezekbayev , Arman Bolatov , Zhenisbek Assylbekov

Benchmarking of Clustering Validity Measures Revisited

Validation plays a crucial role in the clustering process. Many different internal validity indexes exist for the purpose of determining the best clustering solution(s) from a given collection of candidates, e.g., as produced by different…

机器学习 · 统计学 2026-02-23 Connor Simpson , Ricardo J. G. B. Campello , Elizabeth Stojanovski

Bayesian Neural Networks for Functional ANOVA model

With the increasing demand for interpretability in machine learning, functional ANOVA decomposition has gained renewed attention as a principled tool for breaking down high-dimensional function into low-dimensional components that reveal…

机器学习 · 统计学 2026-02-23 Seokhun Park , Choeun Kim , Jihu Lee , Yunseop Shin , Insung Kong , Yongdai Kim

Inference in Spreading Processes with Neural-Network Priors

Stochastic processes on graphs are a powerful tool for modelling complex dynamical systems such as epidemics. A recent line of work focused on the inference problem where one aims to estimate the state of every node at every time, starting…

机器学习 · 统计学 2026-02-23 Davide Ghio , Fabrizio Boncoraglio , Lenka Zdeborová

Nearly Minimax Discrete Distribution Estimation in Kullback-Leibler Divergence with High Probability

We consider the fundamental problem of estimating a discrete distribution on a domain of size $K$ with high probability in Kullback-Leibler divergence. We provide upper and lower bounds on the minimax estimation rate, which show that the…

机器学习 · 统计学 2026-02-23 Dirk van der Hoeven , Julia Olkhovskaia , Tim van Erven

CAE: Repurposing the Critic as an Explorer in Deep Reinforcement Learning

Exploration remains a fundamental challenge in reinforcement learning, as many existing methods either lack theoretical guarantees or fall short in practical effectiveness. In this paper, we propose CAE, i.e., the Critic as an Explorer, a…

机器学习 · 统计学 2026-02-23 Yexin Li

The influence of missing data mechanisms and simple missing data handling techniques on fairness

Machine learning algorithms permeate the day-to-day aspects of our lives and therefore studying the fairness of these algorithms before implementation is crucial. One way in which bias can manifest in a dataset is through missing values.…

机器学习 · 统计学 2026-02-23 Aeysha Bhatti , Trudie Sandrock , Johane Nienkemper-Swanepoel

An AI-powered Bayesian generative modeling approach for causal inference in observational studies

Causal inference in observational studies with high-dimensional covariates presents significant challenges. We introduce CausalBGM, an AI-powered Bayesian generative modeling approach that captures the causal relationship among covariates,…

机器学习 · 统计学 2026-02-23 Qiao Liu , Wing Hung Wong

Learning Performance Maximizing Ensembles with Explainability Guarantees

In this paper we propose a method for the optimal allocation of observations between an intrinsically explainable glass box model and a black box model. An optimal allocation being defined as one which, for any given explainability level…

机器学习 · 统计学 2026-02-23 Vincent Pisztora , Jia Li

Fair Community Detection and Structure Learning in Heterogeneous Graphical Models

Inference of community structure in probabilistic graphical models may not be consistent with fairness constraints when nodes have demographic attributes. Certain demographics may be over-represented in some detected communities and…

机器学习 · 统计学 2026-02-23 Davoud Ataee Tarzanagh , Laura Balzano , Alfred O. Hero

SOLVAR: Fast covariance-based heterogeneity analysis with pose refinement for cryo-EM

Cryo-electron microscopy (cryo-EM) has emerged as a powerful technique for resolving the three-dimensional structures of macromolecules. A key challenge in cryo-EM is characterizing continuous heterogeneity, where molecules adopt a…

机器学习 · 统计学 2026-02-20 Roey Yadgar , Roy R. Lederman , Yoel Shkolnisky

genriesz: A Python Package for Automatic Debiased Machine Learning with Generalized Riesz Regression

Efficient estimation of causal and structural parameters can be automated using the Riesz representation theorem and debiased machine learning (DML). We present genriesz, an open-source Python package that implements automatic DML and…

机器学习 · 统计学 2026-02-20 Masahiro Kato

MGD: Moment Guided Diffusion for Maximum Entropy Generation

Generating samples from limited information is a fundamental problem across scientific domains. Classical maximum entropy methods provide principled uncertainty quantification from moment constraints but require sampling via MCMC or…

机器学习 · 统计学 2026-02-20 Etienne Lempereur , Nathanaël Cuvelle--Magar , Florentin Coeurdoux , Stéphane Mallat , Eric Vanden-Eijnden