机器学习 — Scifaro

Learning Nonlinear Factor Models with Unknown Monotone Links from Incomplete and Noisy Data

We study a nonlinear factor model in which observed responses depend on low-rank latent factors through an unknown monotone link function. This setting is challenging and largely underexplored due to severe nonconvexity and identifiability…

机器学习 · 统计学 2026-05-27 Yutong Chao , Resat Gökhan , Jalal Etesami , Ali Habibnia

PAC Learning with Bandit Feedback: Sharp Sample Complexity in the Realizable Setting

We study the problem of multiclass PAC learning with bandit feedback in the realizable setting. In this framework, there is an unknown data distribution over an instance space $\mathcal{X}$ and a label space $\mathcal{Y}$, as in classical…

机器学习 · 统计学 2026-05-27 Steve Hanneke , Qinglin Meng , Shay Moran , Amirreza Shaeiri

Provably Data-driven Lagrangian Relaxation for Mixed Integer Linear Programming

Lagrangian Relaxation (LR) is a powerful technique for solving large-scale Mixed Integer Linear Programming (MILP), particularly those with decomposable structures, such as vehicle routing or unit commitment problems. By relaxing the…

机器学习 · 统计学 2026-05-27 Tung Quoc Le , Anh Tuan Nguyen , Viet Anh Nguyen

Shallow ReLU$^s$ Networks in $L^p$-Type and Sobolev Spaces: Approximation and Path-Norm Controlled Generalization

This paper studies approximation by shallow ReLU$^s$ networks, $\sigma_s(t)=\max\{0,t\}^s$, together with their generalization behavior under $\ell_1$ path-norm control. For the $L^p$-type integral spaces…

机器学习 · 统计学 2026-05-27 Weizhao Li , Fanghui Liu , Lei Shi

Jacobian-Velocity Bounds for Deployment Risk Under Covariate Drift

We study long-horizon deployment of a frozen predictor under dynamic covariate shift. A time-domain Poincare inequality first reduces temporal risk volatility to derivative energy. A Jacobian-velocity theorem then supplies the corresponding…

机器学习 · 统计学 2026-05-27 Jonathan R. Landers

Nonparametric Instrumental Variable Analysis Without Structural Equations: Debiased Inference on Functionals of Inverse Problems with No Solutions

We consider debiased inference on finite-dimensional functionals of infinite-dimensional least-squares solutions to inverse problems as a way to avoid having to assume exact solutions exist. Such assumptions are substantive and not…

机器学习 · 统计学 2026-05-27 Zikai Shen , Nathan Kallus , Dimitri Meunier , Houssam Zenati , Arthur Gretton , Aurélien Bibaut

Assessing Per-Sample Membership Inference Vulnerability without Retraining

Recent work in the privacy literature shows that sample-targeted membership inference attacks (MIAs) significantly outperform untargeted approaches by a wide margin. Motivated by this observation, we address the following question: can the…

机器学习 · 统计学 2026-05-27 Valentin Dorseuil , Jamal Atif , Olivier Cappé

Membership Inference Risks in Quantized Models: A Theoretical and Empirical Study

Quantizing machine learning models has demonstrated its effectiveness in lowering memory and inference costs while maintaining performance levels comparable to those of the original models. In this work, we investigate the impact of…

机器学习 · 统计学 2026-05-27 Eric Aubinais , Philippe Formont , Pablo Piantanida , Elisabeth Gassiat

Fast Spectrum Estimation of Some Kernel Matrices

In data science, individual observations are often assumed to come independently from an underlying probability space. Kernel matrices formed from large sets of such observations arise frequently, for example during classification tasks. It…

机器学习 · 统计学 2026-05-27 Mikhail Lepilov

Robust Classification of High-Dimensional Data using Data-Adaptive Energy Distance

Classification of high-dimensional low sample size (HDLSS) data poses a challenge in a variety of real-world situations, such as gene expression studies, cancer research, and medical imaging. This article presents the development and…

机器学习 · 统计学 2026-05-27 Jyotishka Ray Choudhury , Aytijhya Saha , Sarbojit Roy , Subhajit Dutta

DiscoverPhysics: Benchmarking LLMs for Out-of-the-Box Scientific Thinking

Frontier LLMs now perform strongly across a wide range of physics evaluations, but it is hard to disentangle genuine reasoning from recall of established science. We introduce DiscoverPhysics, an interactive benchmark that asks a LLM agent…

机器学习 · 统计学 2026-05-26 Matt L. Wiemann , Lindsay M. Smith , Peter Melchior , Siddharth Mishra-Sharma , Andrew Gordon Wilson , Pavel Izmailov , Carolina Cuesta-Lázaro

Statistical Inference for Stochastic Gradient Descent Beyond Finite Variance

Stochastic gradient descent (SGD) is a foundational algorithm for large-scale statistical learning and stochastic optimization. However, statistical inference based on SGD iterates remains challenging when stochastic gradients have infinite…

机器学习 · 统计学 2026-05-26 Jose Blanchet , Peter Glynn , Wenhao Yang

Efficient Benchmarking Is Just Feature Selection and Multiple Regression

Efficient benchmarking techniques aim to lower the computational cost of evaluating LLMs by predicting full benchmark scores using only a subset of a benchmark's questions. By reframing this problem as an instance of multiple regression…

机器学习 · 统计学 2026-05-26 Sam Bowyer , Acyr Locatelli , Kris Cao

StrTransformer: Source-Wise Structured Transformers for Unsupervised Blind Source Recovery

This paper proposes StrTransformer, a source-wise structured Transformer framework for blind source recovery and branch-wise latent modeling. Instead of using an encoder to infer latent variables, StrTransformer directly optimizes the…

机器学习 · 统计学 2026-05-26 Yuan-Hao Wei

Learning Sparse Compositional Functions with Norm-Constrained Neural Networks

The ability of deep neural networks to learn hierarchical features is widely regarded as a key mechanism underlying their success in high-dimensional learning. Existing theory partially supports this view by establishing approximation rates…

机器学习 · 统计学 2026-05-26 Shuo Huang , Lorenzo Fiorito , Lorenzo Rosasco , Tomaso Poggio

Optimal Design for Multinomial Logit Model with Applications to Best Assortment Identification

We study optimal experimental design for multinomial logit (MNL) bandits, where an agent repeatedly selects a subset of $K$ items from a ground set of size $N$ and observes single-choice feedback. Unlike linear or generalized linear…

机器学习 · 统计学 2026-05-26 Joongkyu Lee , Min-hwan Oh

Nonstationary Generalized Linear Bandits with Discounted Online Mirror Descent

We study nonstationary generalized linear bandits (GLBs), where the expected reward is modeled through a nonlinear link function with an unknown time-varying parameter. This framework encompasses a broad class of reward models, including…

机器学习 · 统计学 2026-05-26 Joongkyu Lee , Min-hwan Oh

From DPPs to $k$-DPPs: identifiability analysis via spectral decomposition

We study the geometry of determinantal point processes (DPPs) through the spectral decomposition $L=U\Lambda U^{\top}$. The spectrum $\Lambda$ governs the cardinality distribution via elementary symmetric polynomials, while the eigenspace…

机器学习 · 统计学 2026-05-26 Hideitsu Hino , Keisuke Yano

Guided Flow Matching for Forward and Inverse PDE Problems with Sparse Observations: Algorithm and Theory

Reconstructing PDE solutions from sparse observations is a core challenge in scientific computing. We present FM4PDE, a flow-matching generative framework that learns the joint distribution of PDE coefficients (or initial states) and…

机器学习 · 统计学 2026-05-26 Xifeng Zhang , Jin Zhao

Mean-Shift PCA by Knockoff Mean

Removing noise is difficult, but adding noise is easy. In this work, we show how to eliminate mean-shift noisy components from PCA by deliberately introducing knockoff mean-shift perturbation. Standard PCA is highly sensitive to shifts in…

机器学习 · 统计学 2026-05-26 Mengda Li , Zeng Li , Jianfeng Yao