机器学习 — Scifaro

Extrapolation Guarantees for Perturbation Modeling Under the Additive Latent Shift Assumption

We consider the problem of modeling the effects of perturbations like gene knockouts on measurements such as single-cell RNA counts. Given data for some perturbations, we aim to predict the distribution of measurements for new combinations…

机器学习 · 统计学 2026-05-18 Julius von Kügelgen , Jakob Ketterer , Michael Vollenweider , Michael Scholkemper , Xinwei Shen , Nicolai Meinshausen , Jonas Peters

Detecting Localized Density Anomalies in Multivariate Data via Coin-Flip Statistics

Detecting localized differences between two samples is a central task in scientific data analysis, required for the identification of signal events, regime changes, or model mismatch. We introduce EagleEye, a method that pinpoints local…

机器学习 · 统计学 2026-05-18 Sebastian Springer , Andre Scaffidi , Maximilian Autenrieth , Gabriella Contardo , Alessandro Laio , Roberto Trotta , Heikki Haario

Interpretability of Graph Neural Networks to Assess Effects of Global Change Drivers on Ecological Networks

Pollinators play a crucial role for plant reproduction, either in natural ecosystem or in human-modified landscape. Global change drivers,including climate change or land use modifications, can alter the plant-pollinator interactions. To…

机器学习 · 统计学 2026-05-18 Emre Anakok , Pierre Barbillon , Colin Fontaine , Elisa Thebault

From XAI to MLOps: Explainable Concept Drift Detection with Profile Drift Detection

Predictive models often degrade in performance due to evolving data distributions, a phenomenon known as data drift. Among its forms, concept drift, where the relationship between explanatory variables and the response variable changes, is…

机器学习 · 统计学 2026-05-18 Ugur Dar , Mustafa Cavus

Density Estimation via Binless Multidimensional Integration

We introduce the Binless Multidimensional Thermodynamic Integration (BMTI) method for nonparametric, robust, and data-efficient density estimation. BMTI estimates the logarithm of the density by initially computing log-density differences…

机器学习 · 统计学 2026-05-18 Matteo Carli , Alex Rodriguez , Alessandro Laio , Aldo Glielmo

RoSHAP: A Distributional Framework and Robust Metric for Stable Feature Attribution

Feature attribution analysis is critical for interpreting machine learning models and supporting reliable data-driven decisions. However, feature attribution measures often exhibit stochastic variation: different train--test splits, random…

机器学习 · 统计学 2026-05-15 Lanxin Xiang , Liang Shi , Youhui Ye , Boyu Jiang , Dawei Zhou , Feng Guo

From Data to Action: Accelerating Refinery Optimization with AI

Nowadays refinery optimization utilizes sheer amounts of data, which can be handled with modern Linear Programming (LP) software, but the interpreting and applying the results remains challenging. Large petrochemical companies use massive…

机器学习 · 统计学 2026-05-15 Dániel Pfeifer , Ábrahám Papp , Tibor Bernáth , Tamás Zoltán Varga , Márk Czifra , Botond Szilágyi , Edith Alice Kovács

Average Gradient Outer Product in kernel regression provably recovers the central subspace for multi-index models

We study a prototypical situation when a learned predictor can discover useful low-dimensional structure in data, while using fewer samples than are needed for accurate prediction. Specifically, we consider the problem of recovering a…

机器学习 · 统计学 2026-05-15 Libin Zhu , Damek Davis , Dmitriy Drusvyatskiy , Maryam Fazel

K-Models: a Flexible and Interpretable Method for Ordinal Clustering with Application to Antigen-Antibody Interaction Profiles

Existing clustering methods for functional data often prioritize partitioning accuracy over interpretability, making it challenging to extract meaningful insights when the data-generating process follows a specific underlying structure and…

机器学习 · 统计学 2026-05-15 Giulia Patanè , Alessandra Menafoglio , Alexander Krauth , Peter Fechner , Luca Dede' , Bianca Maria Colosimo , Federica Nicolussi

Scaling Laws from Sequential Feature Recovery: A Solvable Hierarchical Model

We propose a simple mechanism by which scaling laws emerge from feature learning in multi-layer networks. We study a high-dimensional hierarchical target that is a globally high-degree function, but that can be represented by a combination…

机器学习 · 统计学 2026-05-15 Arie Wortsman-Zurich , Hugo Tabanelli , Yatin Dandi , Florent Krzakala , Bruno Loureiro

Large Dimensional Kernel Ridge Regression: Extending to Product Kernels

Recent studies have reported $\textit{saturation effects}$ and $\textit{multiple descent behavior}$ in large dimensional kernel ridge regression (KRR). However, these findings are predominantly derived under restrictive settings, such as…

机器学习 · 统计学 2026-05-15 Yang Zhou , Yicheng Li , Yuqian Cheng , Qian Lin

Training-Free Generative Sampling via Moment-Matched Score Smoothing

Diffusion models generate samples by denoising along the score of a perturbed target distribution. In practice, one trains a neural diffusion model, which is computationally expensive. Recent work suggests that score matching implicitly…

机器学习 · 统计学 2026-05-15 Zhenyu Yao , Daniel Paulin

To discretize continually: Mean shift interacting particle systems for Bayesian inference

Integration against a probability distribution given its unnormalized density is a central task in Bayesian inference and other fields. We introduce new methods for approximating such expectations with a small set of weighted samples --…

机器学习 · 统计学 2026-05-15 Ayoub Belhadji , Daniel Sharp , Youssef M. Marzouk

Pause and Reflect: Conformal Aggregation for Chain-of-Thought Reasoning

Chain-of-thought (CoT) reasoning with self-consistency improves performance by aggregating multiple sampled reasoning paths. In this setting, correctness is no longer tied to a single reasoning trace but to the aggregation rule over a pool…

机器学习 · 统计学 2026-05-15 Yu Gu , Zijun Yu , Vahid Partovi Nia , Masoud Asgharian

A Regret Perspective on Online Multiple Testing

Online Multiple Testing (OMT), a fundamental pillar of sequential statistical inference, traditionally evaluates the False Discovery Rate (FDR) and statistical power in isolation, obscuring the highly asymmetric costs of false positives and…

机器学习 · 统计学 2026-05-15 Qingyang Hao , Kongchang Zhou , Fang Kong , Hongxin Wei

Multi-Scale Dequant: Eliminating Dequantization Bottleneck via Activation Decomposition for Efficient LLM Inference

Quantization is essential for efficient large language model (LLM) inference, yet the dequantization step-converting low-bit weights back to high-precision for matrix multiplication has become a critical bottleneck on modern AI…

机器学习 · 统计学 2026-05-15 Lingchao Zheng , Yuwei Fan , Jun Li , Chengqiu Hu , Qichen Liao , Junyi Fan , Rui Shi , Fangzheng Miao

A Survey on Data-Dependent Worst-Case Generalization Bounds

Deep neural networks generalize well despite being heavily overparameterized, in apparent contradiction with classical learning theory based on uniform convergence over fixed hypothesis spaces. Uniform bounds over the entire parameter space…

机器学习 · 统计学 2026-05-15 Hubert Leroux , Jean Marcus , Julien Roger

Covariance-aware sampling for Diffusion Models

We present a covariance-aware sampler that improves the quality of pixel-space Diffusion Model (DM) sampling in the few-step regime. We hypothesize that in the few-step regime samplers fail because they rely solely on the predicted mean of…

机器学习 · 统计学 2026-05-15 Andrea Schioppa , Tim Salimans

AIS: Adaptive Importance Sampling for Quantized RL

Reinforcement learning (RL) for large language models (LLMs) is dominated by the cost of rollout generation, which has motivated the use of low-precision rollouts (e.g., FP8) paired with a BF16 trainer to improve throughput and reduce…

机器学习 · 统计学 2026-05-15 Jiajun Zhou , Wei Shao , Lingchao Zheng , Yuwei Fan , Ngai Wong

Generalizing Score-based generative models for Heavy-tailed Distributions

Score-based generative models (SGMs) have achieved remarkable empirical success, motivating their application to a broad range of data distributions. However, extending them to heavy-tailed targets remains a largely open problem. Although…

机器学习 · 统计学 2026-05-15 Tiziano Fassina , Gabriel Cardoso , Sylvan Le Corff , Thomas Romary