机器学习 — Scifaro

Dimension-free error estimate for diffusion model and optimal scheduling

Diffusion generative models have emerged as powerful tools for producing synthetic data from an empirically observed distribution. A common approach involves simulating the time-reversal of an Ornstein-Uhlenbeck (OU) process initialized at…

机器学习 · 统计学 2025-12-02 Valentin de Bortoli , Romuald Elie , Anna Kazeykina , Zhenjie Ren , Jiacheng Zhang

Decision Tree Embedding by Leaf-Means

Decision trees and random forest remain highly competitive for classification on medium-sized, standard datasets due to their robustness, minimal preprocessing requirements, and interpretability. However, a single tree suffers from high…

机器学习 · 统计学 2025-12-02 Cencheng Shen , Yuexiao Dong , Carey E. Priebe

Common Structure Discovery in Collections of Bipartite Networks: Application to Pollination Systems

Bipartite networks are widely used to encode the ecological interactions. Being able to compare the organization of bipartite networks is a first step toward a better understanding of how environmental factors shape community structure and…

机器学习 · 统计学 2025-12-02 Louis Lacoste , Pierre Barbillon , Sophie Donnet

LPCD: Unified Framework from Layer-Wise to Submodule Quantization

Post-training quantization (PTQ) aims to preserve model-level behavior; however, most methods focus on individual linear layers. Even recent extensions, such as QEP and LoaQ, which mitigate error propagation or target specific submodules,…

机器学习 · 统计学 2025-12-02 Yuma Ichikawa , Yudai Fujimoto , Akira Sakai

Implicitly Normalized Online PCA: A Regularized Algorithm with Exact High-Dimensional Dynamics

Many online learning algorithms, including classical online PCA methods, enforce explicit normalization steps that discard the evolving norm of the parameter vector. We show that this norm can in fact encode meaningful information about the…

机器学习 · 统计学 2025-12-02 Samet Demir , Zafer Dogan

High-dimensional Mean-Field Games by Particle-based Flow Matching

Mean-field games (MFGs) study the Nash equilibrium of systems with a continuum of interacting agents, which can be formulated as the fixed-point of optimal control problems. They provide a unified framework for a variety of applications,…

机器学习 · 统计学 2025-12-02 Jiajia Yu , Junghwan Lee , Yao Xie , Xiuyuan Cheng

Discriminative classification with generative features: bridging Naive Bayes and logistic regression

We introduce Smart Bayes, a new classification framework that bridges generative and discriminative modeling by integrating likelihood-ratio-based generative features into a logistic-regression-style discriminative classifier. From the…

机器学习 · 统计学 2025-12-02 Zachary Terner , Alexander Petersen , Yuedong Wang

An Approach to Variable Clustering: K-means in Transposed Data and its Relationship with Principal Component Analysis

Principal Component Analysis (PCA) and K-means constitute fundamental techniques in multivariate analysis. Although they are frequently applied independently or sequentially to cluster observations, the relationship between them, especially…

机器学习 · 统计学 2025-12-02 Victor Saquicela , Kenneth Palacio-Baus , Mario Chifla

Thompson Sampling for Multi-Objective Linear Contextual Bandit

We study the multi-objective linear contextual bandit problem, where multiple possible conflicting objectives must be optimized simultaneously. We propose \texttt{MOL-TS}, the \textit{first} Thompson Sampling algorithm with Pareto regret…

机器学习 · 统计学 2025-12-02 Somangchan Park , Heesang Ann , Min-hwan Oh

Outcome-Aware Spectral Feature Learning for Instrumental Variable Regression

We address the problem of causal effect estimation in the presence of hidden confounders using nonparametric instrumental variable (IV) regression. An established approach is to use estimators based on learned spectral features, that is,…

机器学习 · 统计学 2025-12-02 Dimitri Meunier , Jakub Wornbard , Vladimir R. Kostic , Antoine Moulin , Alek Fröhlich , Karim Lounici , Massimiliano Pontil , Arthur Gretton

Restricted Block Permutation for Two-Sample Testing

We study a structured permutation scheme for two-sample testing that restricts permutations to single cross-swaps between block-selected representatives. Our analysis yields three main results. First, we provide an exact validity…

机器学习 · 统计学 2025-12-02 Jungwoo Ho

Self-sufficient Independent Component Analysis via KL Minimizing Flows

We study the problem of learning disentangled signals from data using non-linear Independent Component Analysis (ICA). Motivated by advances in self-supervised learning, we propose to learn self-sufficient signals: A recovered signal should…

机器学习 · 统计学 2025-12-02 Song Liu

Statistical-computational gap in multiple Gaussian graph alignment

We investigate the existence of a statistical-computational gap in multiple Gaussian graph alignment. We first generalize a previously established informational threshold from Vassaux and Massouli\'e (2025) to regimes where the number of…

机器学习 · 统计学 2025-12-02 Bertrand Even , Luca Ganassali

An RKHS Perspective on Tree Ensembles

Random Forests and Gradient Boosting are among the most effective algorithms for supervised learning on tabular data. Both belong to the class of tree-based ensemble methods, where predictions are obtained by aggregating many randomized…

机器学习 · 统计学 2025-12-02 Mehdi Dagdoug , Clement Dombry , Jean-Jil Duchamps

Geometric Calibration and Neutral Zones for Uncertainty-Aware Multi-Class Classification

Modern artificial intelligence systems make critical decisions yet often fail silently when uncertain -- even well-calibrated models provide no mechanism to identify \textit{which specific predictions} are unreliable. We develop a geometric…

机器学习 · 统计学 2025-12-02 Soumojit Das , Nairanjana Dasgupta , Prashanta Dutta

When Features Beat Noise: A Feature Selection Technique Through Noise-Based Hypothesis Testing

Feature selection has remained a daunting challenge in machine learning and artificial intelligence, where increasingly complex, high-dimensional datasets demand principled strategies for isolating the most informative predictors. Despite…

机器学习 · 统计学 2025-12-02 Mousam Sinha , Tirtha Sarathi Ghosh , Ridam Pal

General Pruning Criteria for Fast SBL

Sparse Bayesian learning (SBL) associates to each weight in the underlying linear model a hyperparameter by assuming that each weight is Gaussian distributed with zero mean and precision (inverse variance) equal to its associated…

机器学习 · 统计学 2025-12-02 Jakob Möderl , Erik Leitinger , Bernard Henri Fleury

Non-stationary Bandit Convex Optimization: A Comprehensive Study

Bandit Convex Optimization is a fundamental class of sequential decision-making problems, where the learner selects actions from a continuous domain and observes a loss (but not its gradient) at only one point per round. We study this…

机器学习 · 统计学 2025-12-02 Xiaoqi Liu , Dorian Baudry , Julian Zimmert , Patrick Rebeschini , Arya Akhavan

Preconditioned Langevin Dynamics with Score-Based Generative Models for Infinite-Dimensional Linear Bayesian Inverse Problems

Designing algorithms for solving high-dimensional Bayesian inverse problems directly in infinite-dimensional function spaces - where such problems are naturally formulated - is crucial to ensure stability and convergence as the…

机器学习 · 统计学 2025-12-02 Lorenzo Baldassari , Josselin Garnier , Knut Solna , Maarten V. de Hoop

Convergence of Shallow ReLU Networks on Weakly Interacting Data

We analyse the convergence of one-hidden-layer ReLU networks trained by gradient flow on $n$ data points. Our main contribution leverages the high dimensionality of the ambient space, which implies low correlation of the input samples, to…

机器学习 · 统计学 2025-12-02 Léo Dana , Francis Bach , Loucas Pillaud-Vivien