机器学习 — Scifaro

High-dimensional Analysis of Synthetic Data Selection

Despite the progress in the development of generative models, their usefulness in creating synthetic data that improve prediction performance of classifiers has been put into question. Besides heuristic principles such as "synthetic data…

机器学习 · 统计学 2025-10-10 Parham Rezaei , Filip Kovacevic , Francesco Locatello , Marco Mondelli

Stick-Breaking Mixture Normalizing Flows with Component-Wise Tail Adaptation for Variational Inference

Normalizing flows with a Gaussian base provide a computationally efficient way to approximate posterior distributions in Bayesian inference, but they often struggle to capture complex posteriors with multimodality and heavy tails. We…

机器学习 · 统计学 2025-10-10 Seungsu Han , Juyoung Hwang , Won Chang

On the Optimality of the Median-of-Means Estimator under Adversarial Contamination

The Median-of-Means (MoM) is a robust estimator widely used in machine learning that is known to be (minimax) optimal in scenarios where samples are i.i.d. In more grave scenarios, samples are contaminated by an adversary that can inspect…

机器学习 · 统计学 2025-10-10 Xabier de Juan , Santiago Mazuelas

On the Optimality of Tracking Fisher Information in Adaptive Testing with Stochastic Binary Responses

We study the problem of estimating a continuous ability parameter from sequential binary responses by actively asking questions with varying difficulties, a setting that arises naturally in adaptive testing and online preference learning.…

机器学习 · 统计学 2025-10-10 Sanghwa Kim , Dohyun Ahn , Seungki Min

Surrogate Graph Partitioning for Spatial Prediction

Spatial prediction refers to the estimation of unobserved values from spatially distributed observations. Although recent advances have improved the capacity to model diverse observation types, adoption in practice remains limited in…

机器学习 · 统计学 2025-10-10 Yuta Shikuri , Hironori Fujisawa

A Honest Cross-Validation Estimator for Prediction Performance

Cross-validation is a standard tool for obtaining a honest assessment of the performance of a prediction model. The commonly used version repeatedly splits data, trains the prediction model on the training set, evaluates the model…

机器学习 · 统计学 2025-10-10 Tianyu Pan , Vincent Z. Yu , Viswanath Devanarayan , Lu Tian

Evaluating and Learning Optimal Dynamic Treatment Regimes under Truncation by Death

Truncation by death, a prevalent challenge in critical care, renders traditional dynamic treatment regime (DTR) evaluation inapplicable due to ill-defined potential outcomes. We introduce a principal stratification-based method, focusing on…

机器学习 · 统计学 2025-10-10 Sihyung Park , Wenbin Lu , Shu Yang

A pseudo-inverse of a line graph

Line graphs are an alternative representation of graphs where each vertex of the original (root) graph becomes an edge. However not all graphs have a corresponding root graph, hence the transformation from graphs to line graphs is not…

机器学习 · 统计学 2025-10-10 Sevvandi Kandanaarachchi , Philip Kilby , Cheng Soon Ong

Continuum Transformers Perform In-Context Learning by Operator Gradient Descent

Transformers robustly exhibit the ability to perform in-context learning, whereby their predictive accuracy on a task can increase not by parameter updates but merely with the placement of training samples in their context windows. Recent…

机器学习 · 统计学 2025-10-10 Abhiti Mishra , Yash Patel , Ambuj Tewari

Graphon Mixtures

Social networks have a small number of large hubs, and a large number of small dense communities. We propose a generative model that captures both hub and dense structures. Based on recent results about graphons on line graphs, our model is…

机器学习 · 统计学 2025-10-10 Sevvandi Kandanaarachchi , Cheng Soon Ong

Golden Ratio Weighting Prevents Model Collapse

Recent studies identified an intriguing phenomenon in recursive generative model training known as model collapse, where models trained on data generated by previous models exhibit severe performance degradation. Addressing this issue and…

机器学习 · 统计学 2025-10-10 Hengzhi He , Shirong Xu , Guang Cheng

Latency-Aware Contextual Bandit: Application to Cryo-EM Data Collection

We introduce a latency-aware contextual bandit framework that generalizes the standard contextual bandit problem, where the learner adaptively selects arms and switches decision sets under action delays. In this setting, the learner…

机器学习 · 统计学 2025-10-10 Lai Wei , Ambuj Tewari , Michael A. Cianfrocco

Diffusion-Augmented Reinforcement Learning for Robust Portfolio Optimization under Stress Scenarios

In the ever-changing and intricate landscape of financial markets, portfolio optimisation remains a formidable challenge for investors and asset managers. Conventional methods often struggle to capture the complex dynamics of market…

机器学习 · 统计学 2025-10-09 Himanshu Choudhary , Arishi Orra , Manoj Thakur

Root Cause Analysis of Outliers in Unknown Cyclic Graphs

We study the propagation of outliers in cyclic causal graphs with linear structural equations, tracing them back to one or several "root cause" nodes. We show that it is possible to identify a short list of potential root causes provided…

机器学习 · 统计学 2025-10-09 Daniela Schkoda , Dominik Janzing

PyCFRL: A Python library for counterfactually fair offline reinforcement learning via sequential data preprocessing

Reinforcement learning (RL) aims to learn and evaluate a sequential decision rule, often referred to as a "policy", that maximizes the population-level benefit in an environment across possibly infinitely many time steps. However, the…

机器学习 · 统计学 2025-10-09 Jianhan Zhang , Jitao Wang , Chengchun Shi , John D. Piette , Donglin Zeng , Zhenke Wu

Bayesian Nonparametric Dynamical Clustering of Time Series

We present a method that models the evolution of an unbounded number of time series clusters by switching among an unknown number of regimes with linear dynamics. We develop a Bayesian non-parametric approach using a hierarchical Dirichlet…

机器学习 · 统计学 2025-10-09 Adrián Pérez-Herrero , Paulo Félix , Jesús Presedo , Carl Henrik Ek

Q-Learning with Fine-Grained Gap-Dependent Regret

We study fine-grained gap-dependent regret bounds for model-free reinforcement learning in episodic tabular Markov Decision Processes. Existing model-free algorithms achieve minimax worst-case regret, but their gap-dependent bounds remain…

机器学习 · 统计学 2025-10-09 Haochen Zhang , Zhong Zheng , Lingzhou Xue

Online Matching via Reinforcement Learning: An Expert Policy Orchestration Strategy

Online matching problems arise in many complex systems, from cloud services and online marketplaces to organ exchange networks, where timely, principled decisions are critical for maintaining high system performance. Traditional heuristics…

机器学习 · 统计学 2025-10-09 Chiara Mignacco , Matthieu Jonckheere , Gilles Stoltz

A General Constructive Upper Bound on Shallow Neural Nets Complexity

We provide an upper bound on the number of neurons required in a shallow neural network to approximate a continuous function on a compact set with a given accuracy. This method, inspired by a specific proof of the Stone-Weierstrass theorem,…

机器学习 · 统计学 2025-10-09 Frantisek Hakl , Vit Fojtik

Simulation-based inference via telescoping ratio estimation for trawl processes

The growing availability of large and complex datasets has increased interest in temporal stochastic processes that can capture stylized facts such as marginal skewness, non-Gaussian tails, long memory, and even non-Markovian dynamics.…

机器学习 · 统计学 2025-10-09 Dan Leonte , Raphaël Huser , Almut E. D. Veraart