机器学习 — Scifaro

Rethinking Multinomial Logistic Mixture of Experts with Sigmoid Gating Function

The sigmoid gate in mixture-of-experts (MoE) models has been empirically shown to outperform the softmax gate across several tasks, ranging from approximating feed-forward networks to language modeling. Additionally, recent efforts have…

机器学习 · 统计学 2026-02-03 Tuan Minh Pham , Thinh Cao , Viet Nguyen , Huy Nguyen , Nhat Ho , Alessandro Rinaldo

Importance Weighted Variational Inference without the Reparameterization Trick

Importance weighted variational inference (VI) approximates densities known up to a normalizing constant by optimizing bounds that tighten with the number of Monte Carlo samples $N$. Standard optimization relies on reparameterized gradient…

机器学习 · 统计学 2026-02-03 Kamélia Daudel , Minh-Ngoc Tran , Cheng Zhang

Online Social Welfare Function-based Resource Allocation

In many real-world settings, a centralized decision-maker must repeatedly allocate finite resources to a population over multiple time steps. Individuals who receive a resource derive some stochastic utility; to characterize the…

机器学习 · 统计学 2026-02-03 Kanad Pardeshi , Samsara Foubert , Aarti Singh

Score-based Metropolis-Hastings for Fractional Langevin Algorithms

Sampling from heavy-tailed and multimodal distributions is challenging when neither the target density nor the proposal density can be evaluated, as in $\alpha$-stable L\'evy-driven fractional Langevin algorithms. While the target…

机器学习 · 统计学 2026-02-03 Ahmed Aloui , Junyi Liao , Ali Hasan , Jose Blanchet , Vahid Tarokh

Harmful Overfitting in Sobolev Spaces

Motivated by recent work on benign overfitting in overparameterized machine learning, we study the generalization behavior of functions in Sobolev spaces $W^{k, p}(\mathbb{R}^d)$ that perfectly fit a noisy training data set. Under…

机器学习 · 统计学 2026-02-03 Kedar Karhadkar , Alexander Sietsema , Deanna Needell , Guido Montufar

Safety-Efficacy Trade Off: Robustness against Data-Poisoning

Backdoor and data poisoning attacks can achieve high attack success while evading existing spectral and optimisation based defences. We show that this behaviour is not incidental, but arises from a fundamental geometric mechanism in input…

机器学习 · 统计学 2026-02-03 Diego Granziol

Hessian Spectral Analysis at Foundation Model Scale

Accurate Hessian spectra of foundation models have remained out of reach, leading most prior work to rely on small models or strong structural approximations. We show that faithful spectral analysis of the true Hessian is tractable at…

机器学习 · 统计学 2026-02-03 Diego Granziol , Khurshid Juarev

Zero-Flow Encoders

Flow-based methods have achieved significant success in various generative modeling tasks, capturing nuanced details within complex data distributions. However, few existing works have exploited this unique capability to resolve…

机器学习 · 统计学 2026-02-03 Yakun Wang , Leyang Wang , Song Liu , Taiji Suzuki

Sampling from multi-modal distributions on Riemannian manifolds with training-free stochastic interpolants

In this paper, we propose a general methodology for sampling from un-normalized densities defined on Riemannian manifolds, with a particular focus on multi-modal targets that remain challenging for existing sampling methods. Inspired by the…

机器学习 · 统计学 2026-02-03 Alain Durmus , Maxence Noble , Thibaut Pellerin

Action-Free Offline-to-Online RL via Discretised State Policies

Most existing offline RL methods presume the availability of action labels within the dataset, but in many practical scenarios, actions may be missing due to privacy, storage, or sensor limitations. We formalise the setting of action-free…

机器学习 · 统计学 2026-02-03 Natinael Solomon Neggatu , Jeremie Houssineau , Giovanni Montana

Topological Residual Asymmetry for Bivariate Causal Direction

Inferring causal direction from purely observational bivariate data is fragile: many methods commit to a direction even in ambiguous or near non-identifiable regimes. We propose Topological Residual Asymmetry (TRA), a geometry-based…

机器学习 · 统计学 2026-02-03 Mouad El Bouchattaoui

Alignment of Diffusion Model and Flow Matching for Text-to-Image Generation

Diffusion models and flow matching have demonstrated remarkable success in text-to-image generation. While many existing alignment methods primarily focus on fine-tuning pre-trained generative models to maximize a given reward function,…

机器学习 · 统计学 2026-02-03 Yidong Ouyang , Liyan Xie , Hongyuan Zha , Guang Cheng

Reinforcement Learning for Control Systems with Time Delays: A Comprehensive Survey

In the last decade, Reinforcement Learning (RL) has achieved remarkable success in the control and decision-making of complex dynamical systems. However, most RL algorithms rely on the Markov Decision Process assumption, which is violated…

机器学习 · 统计学 2026-02-03 Armando Alves Neto

Neuron Block Dynamics for XOR Classification with Zero-Margin

The ability of neural networks to learn useful features through stochastic gradient descent (SGD) is a cornerstone of their success. Most theoretical analyses focus on regression or on classification tasks with a positive margin, where…

机器学习 · 统计学 2026-02-03 Guillaume Braun , Masaaki Imaizumi

Uncertainty-Aware Multimodal Learning via Conformal Shapley Intervals

Multimodal learning combines information from multiple data modalities to improve predictive performance. However, modalities often contribute unequally and in a data dependent way, making it unclear which data modalities are genuinely…

机器学习 · 统计学 2026-02-03 Mathew Chandy , Michael Johnson , Judong Shen , Devan V. Mehrotra , Hua Zhou , Jin Zhou , Xiaowu Dai

Test time training enhances in-context learning of nonlinear functions

Test-time training (TTT) enhances model performance by explicitly updating designated parameters prior to each prediction to adapt to the test data. While TTT has demonstrated considerable empirical success, its theoretical underpinnings…

机器学习 · 统计学 2026-02-03 Kento Kuwataka , Taiji Suzuki

Single-Head Attention in High Dimensions: A Theory of Generalization, Weights Spectra, and Scaling Laws

Trained attention layers exhibit striking and reproducible spectral structure of the weights, including low-rank collapse, bulk deformation, and isolated spectral outliers, yet the origin of these phenomena and their implications for…

机器学习 · 统计学 2026-02-03 Fabrizio Boncoraglio , Vittorio Erba , Emanuele Troiani , Yizhou Xu , Florent Krzakala , Lenka Zdeborová

When Pattern-by-Pattern Works: Theoretical and Empirical Insights for Logistic Models with Missing Values

Predicting with missing inputs challenges even parametric models, as parameter estimation alone is insufficient for prediction on incomplete data. While several works study prediction in linear models, we focus on logistic models, where…

机器学习 · 统计学 2026-02-03 Christophe Muller , Erwan Scornet , Julie Josse

Safely Learning Controlled Stochastic Dynamics

We address the problem of safely learning controlled stochastic dynamics from discrete-time trajectory observations, ensuring system trajectories remain within predefined safe regions during both training and deployment. Safety-critical…

机器学习 · 统计学 2026-02-03 Luc Brogat-Motte , Alessandro Rudi , Riccardo Bonalli

IGNIS: A Robust Neural Network Framework for Constrained Parameter Estimation in Archimedean Copulas

Classical estimators, the cornerstones of statistical inference, face insurmountable challenges when applied to important emerging classes of Archimedean copulas. These models exhibit pathological properties, including numerically unstable…

机器学习 · 统计学 2026-02-03 Agnideep Aich